Prometheus: how does this question work precisely?

I’ve applied a set off for KEDA occasion based mostly autoscaling in Kubernetes for our backend, which is a Rails API. This set off makes use of a customized Prometheus metric which is mainly the queue time, so the aim is to scale replicas based mostly on how lengthy requests wait within the queue earlier than being carried out.

Now, I bought some assist lately from somebody on a Slack neighborhood to create a question to make use of this metric in a set off, and this particular person recommended this:

scalar(kube_horizontalpodautoscaler_status_current_replicas{horizontalpodautoscaler=”keda-hpa-{{ .Values.service.identify }}-web”}) * histogram_quantile(0.95, sum by (le)(price(ruby_queue_latency_bucket{service=”{{ .Values.service.identify }}-web”}[1m]))) OR on() vector(0)

I’m attempting to know how this works precisely however that particular person is not replying, possibly he is busy or he is left the neighborhood, dunno. I’m aware of the fundamentals of PromQL so I’m a bit confused by this.

kube_horizontalpodautoscaler_status_current_replicas is the variety of replicas at the moment out there and fired up by the horizontal pod autoscaler (HPA), and this HPA is managed by the KEDA ScaledObject.

histogram_quantile(..) is the p95 of the metric I used to be mentioning, the queue time.

Why did he counsel I multiply the replicas with the p95 of the queue time? The HPA will scale based mostly on the typical of the values returned for that question for every reproduction.

Additionally what does OR on() vector(0) do?

One other factor: he recommended that the brink for the set off be 25, which on this case means 25 milliseconds wait.

I’d respect if anybody with extra expertise with Prometheus may make clear these things. Thanks!

🔥 Hot and trending web hostings deals 🔥 - Web Hostings Coupons, Sales, Deals and Discounts