Complete PromQL toolkit with generation and validation capabilities
94
94%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
A DevOps team supports a multi-tenant SaaS platform. Their Grafana dashboards are timing out because several expensive PromQL queries run on every panel refresh. The head of infrastructure has determined that three queries are the main culprits — they are all computed in multiple dashboard panels and run over large time ranges. The team wants to pre-compute these queries as Prometheus recording rules so they can be reused across dashboards without re-evaluating the full expression each time.
Additionally, a teammate proposed the following query for a "per-user request breakdown" panel:
sum by (user_id) (rate(http_requests_total[5m]))The senior engineer flagged this as potentially dangerous to the Prometheus instance but hasn't explained why. The team wants the recording rules written up, and also wants an explanation of the issue with the proposed query and what a safer alternative looks like.
The three queries to pre-compute:
sum by (job) (rate(http_requests_total[5m]))sum by (job) (rate(http_requests_total{status_code=~"5.."}[5m]))histogram_quantile(0.95, sum by (job, le) (rate(http_request_duration_seconds_bucket[5m])))Produce two files:
recording_rules.yaml — Prometheus recording rule YAML for the three queries above, using correct recording rule naming conventions.cardinality_analysis.md — A short Markdown document (1-2 paragraphs) explaining the problem with the sum by (user_id) query and providing a safer alternative query.