pantheon-ai/promql-generator

Comprehensive toolkit for generating best practice PromQL (Prometheus Query Language) queries following current standards and conventions. Use this skill when creating new PromQL queries, implementing monitoring and alerting rules, or building observability dashboards.

100

Quality

100%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

name:: promql-generator
description:: Generate PromQL queries for calculating error rates, aggregating metrics across labels, creating histogram percentiles, writing recording rules, and building SLO burn-rate alerts following Prometheus best practices. Use when creating new PromQL queries, implementing monitoring and alerting rules, building observability dashboards, working with Prometheus metrics (counters, gauges, histograms, summaries), or applying RED (Rate, Errors, Duration) and USE (Utilization, Saturation, Errors) monitoring patterns.

PromQL Query Generator

Name: pantheon-ai/promql-generator
Rating: 100 (1 reviews)
Author: pantheon-ai

Interactive Query Planning Workflow

CRITICAL: Always engage the user in collaborative planning before generating any query. Never skip the planning phase.

Workflow (7 stages)

Understand the goal — Ask what the user wants to monitor (request rate, error rate, latency, resource usage, availability, SLO tracking) and the use case (dashboard, alert, recording rule, ad-hoc).
Identify metrics — Confirm metric names, types (counter/gauge/histogram/summary), and relevant labels. Suggest common naming patterns if uncertain.
Determine parameters — Confirm time range, label filters, aggregation, and thresholds. If the user already specified values (e.g., "5-minute window", "> 5% error rate"), acknowledge them as pre-filled defaults and allow quick confirmation rather than re-asking.
Present the query plan — Before writing any code, present a plain-English plan (goal, query structure, expected output, example interpretation) and ask for confirmation via AskUserQuestion with options: "Yes, generate this query" / "Modify [aspect]" / "Show alternatives".
Generate the query — Once confirmed, read the relevant reference file(s) before writing code, cite the applicable pattern, and apply the best practices below.
Validate — Automatically invoke devops-skills:promql-validator. Display structured results (syntax, best practices, explanation). Fix any issues and re-validate until all checks pass.
Deliver — Provide the final query, plain-English explanation, usage instructions (dashboard / alert / recording rule), customization notes, and related query suggestions.

Ask vs. Infer: If the user's request already clearly specifies goal, use case, and context, acknowledge those details instead of re-asking. Only ask for missing or ambiguous information.

Best Practices for Query Generation

Always consult the relevant reference file before writing code.

Scenario	Reference File
Histogram queries	`references/metric_types.md` (Histogram section)
Error/latency patterns	`references/promql_patterns.md` (RED section)
Resource monitoring	`references/promql_patterns.md` (USE section)
Optimization / anti-patterns	`references/best_practices.md`
Specific functions	`references/promql_functions.md`

Key Rules

Always add label filters — reduces cardinality and improves performance.
Match functions to metric types — rate()/increase() on counters; *_over_time() or direct use for gauges; histogram_quantile() for histograms.
Prefer by()/without() on all aggregations.
Prefer exact label matches over regex when the value is known.
Use recording rules for queries that are expensive or reused frequently (naming: level:metric:operations).
Format multi-line for complex queries.

Core Patterns

# Request rate (counter)
sum(rate(http_requests_total{job="api-server"}[5m])) by (endpoint)

# Error rate ratio
sum(rate(http_requests_total{job="api-server", status_code=~"5.."}[5m]))
/ sum(rate(http_requests_total{job="api-server"}[5m]))

# P95 latency (classic histogram)
histogram_quantile(0.95,
  sum by (le) (rate(http_request_duration_seconds_bucket{job="api-server"}[5m]))
)

# P95 latency (native histogram, Prometheus 3.x+)
histogram_quantile(0.95,
  sum by (job) (rate(http_request_duration_seconds[5m]))
)

# Availability
(count(up{job="api-server"} == 1) / count(up{job="api-server"})) * 100

# Burn rate (99.9% SLO, 1h window)
(
  sum(rate(http_requests_total{job="api", status_code=~"5.."}[1h]))
  / sum(rate(http_requests_total{job="api"}[1h]))
) / 0.001

# Multi-window burn-rate alert (page: 2% budget in 1h, burn rate 14.4)
(
  sum(rate(http_requests_total{job="api", status_code=~"5.."}[1h]))
  / sum(rate(http_requests_total{job="api"}[1h]))
) > 14.4 * 0.001
and
(
  sum(rate(http_requests_total{job="api", status_code=~"5.."}[5m]))
  / sum(rate(http_requests_total{job="api"}[5m]))
) > 14.4 * 0.001

For complete SLO patterns, Native Histogram functions (histogram_count, histogram_sum, histogram_fraction), subqueries, offset/@ modifiers, vector matching, and Kubernetes patterns — see the assets/ files.

Validation Checklist

After generating, invoke devops-skills:promql-validator and display results in this format:

## PromQL Validation Results

### Syntax Check
- Status: ✅ VALID / ⚠️ WARNING / ❌ ERROR
- Issues: [list any syntax errors]

### Best Practices Check
- Status: ✅ OPTIMIZED / ⚠️ CAN BE IMPROVED / ❌ HAS ISSUES
- Issues: [list problems found]
- Suggestions: [list optimizations]

### Query Explanation
- What it measures: [plain English]
- Output labels: [label list or "None (scalar)"]
- Expected result structure: [instant vector / scalar / etc.]

Fix all issues and re-validate until clean.

Alerting and Recording Rule Snippets

# Alerting rule with for clause
alert: HighErrorRate
expr: |
  (
    sum(rate(http_requests_total{status_code=~"5.."}[5m]))
    / sum(rate(http_requests_total[5m]))
  ) > 0.05
for: 10m

# Recording rule (naming: level:metric:operations)
- record: job:http_requests:rate5m
  expr: sum by (job) (rate(http_requests_total[5m]))

Documentation Lookup

context7 MCP (preferred): resolve prometheus, then fetch docs with the relevant topic.
Fallback WebSearch: "Prometheus PromQL [function/operator] documentation [version] examples"

Error Handling Quick Reference

Symptom	Likely Cause	Fix
Empty results	Wrong label filters or metric not scraped	Check `up{job="..."}`, verify label values
Too many series	High cardinality	Add label filters, aggregate, use recording rules
Wrong values	Wrong function for metric type	`rate()` on counters; direct or `*_over_time()` on gauges
Slow queries	Large range vectors or missing filters	Narrow time range, add filters, use recording rules

Resources

references/

promql_functions.md — All PromQL functions with examples. Read for specific function questions.
promql_patterns.md — RED/USE method patterns, alerting, recording rules. Read for standard monitoring patterns.
best_practices.md — Anti-patterns, performance, cardinality. Read when optimizing.
metric_types.md — Counter/Gauge/Histogram/Summary guide. Read to confirm function choice.

assets/

common_queries.promql — Reusable request rate, error rate, latency, availability queries.
red_method.promql — Complete RED method implementation.
use_method.promql — Complete USE method implementation.
slo_patterns.promql — SLO, error budget, burn rate, multi-window alerting.
alerting_rules.yaml — Example alerting rules with thresholds.
recording_rules.yaml — Example recording rules with naming conventions.
kubernetes_patterns.promql — kube-state-metrics, cAdvisor, vector matching.

Always read the relevant reference/asset file and cite the applicable pattern before generating a query.

Workspace: pantheon-ai
Visibility: Public
Created: about 2 months ago
Last updated: about 2 months ago
Publish Source: CLI
Badge