Comprehensive toolkit for generating best practice PromQL (Prometheus Query Language) queries following current standards and conventions. Use this skill when creating new PromQL queries, implementing monitoring and alerting rules, or building observability dashboards.
Overall
score
100%
Does it follow best practices?
Validation for skill structure
CRITICAL: Always engage the user in collaborative planning before generating any query. Never skip the planning phase.
devops-skills:promql-validator. Display structured results (syntax, best practices, explanation). Fix any issues and re-validate until all checks pass.Ask vs. Infer: If the user's request already clearly specifies goal, use case, and context, acknowledge those details instead of re-asking. Only ask for missing or ambiguous information.
Always consult the relevant reference file before writing code.
| Scenario | Reference File |
|---|---|
| Histogram queries | references/metric_types.md (Histogram section) |
| Error/latency patterns | references/promql_patterns.md (RED section) |
| Resource monitoring | references/promql_patterns.md (USE section) |
| Optimization / anti-patterns | references/best_practices.md |
| Specific functions | references/promql_functions.md |
rate()/increase() on counters; *_over_time() or direct use for gauges; histogram_quantile() for histograms.by()/without() on all aggregations.level:metric:operations).# Request rate (counter)
sum(rate(http_requests_total{job="api-server"}[5m])) by (endpoint)
# Error rate ratio
sum(rate(http_requests_total{job="api-server", status_code=~"5.."}[5m]))
/ sum(rate(http_requests_total{job="api-server"}[5m]))
# P95 latency (classic histogram)
histogram_quantile(0.95,
sum by (le) (rate(http_request_duration_seconds_bucket{job="api-server"}[5m]))
)
# P95 latency (native histogram, Prometheus 3.x+)
histogram_quantile(0.95,
sum by (job) (rate(http_request_duration_seconds[5m]))
)
# Availability
(count(up{job="api-server"} == 1) / count(up{job="api-server"})) * 100
# Burn rate (99.9% SLO, 1h window)
(
sum(rate(http_requests_total{job="api", status_code=~"5.."}[1h]))
/ sum(rate(http_requests_total{job="api"}[1h]))
) / 0.001
# Multi-window burn-rate alert (page: 2% budget in 1h, burn rate 14.4)
(
sum(rate(http_requests_total{job="api", status_code=~"5.."}[1h]))
/ sum(rate(http_requests_total{job="api"}[1h]))
) > 14.4 * 0.001
and
(
sum(rate(http_requests_total{job="api", status_code=~"5.."}[5m]))
/ sum(rate(http_requests_total{job="api"}[5m]))
) > 14.4 * 0.001For complete SLO patterns, Native Histogram functions (histogram_count, histogram_sum, histogram_fraction), subqueries, offset/@ modifiers, vector matching, and Kubernetes patterns — see the assets/ files.
After generating, invoke devops-skills:promql-validator and display results in this format:
## PromQL Validation Results
### Syntax Check
- Status: ✅ VALID / ⚠️ WARNING / ❌ ERROR
- Issues: [list any syntax errors]
### Best Practices Check
- Status: ✅ OPTIMIZED / ⚠️ CAN BE IMPROVED / ❌ HAS ISSUES
- Issues: [list problems found]
- Suggestions: [list optimizations]
### Query Explanation
- What it measures: [plain English]
- Output labels: [label list or "None (scalar)"]
- Expected result structure: [instant vector / scalar / etc.]Fix all issues and re-validate until clean.
# Alerting rule with for clause
alert: HighErrorRate
expr: |
(
sum(rate(http_requests_total{status_code=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m]))
) > 0.05
for: 10m
# Recording rule (naming: level:metric:operations)
- record: job:http_requests:rate5m
expr: sum by (job) (rate(http_requests_total[5m]))prometheus, then fetch docs with the relevant topic."Prometheus PromQL [function/operator] documentation [version] examples"| Symptom | Likely Cause | Fix |
|---|---|---|
| Empty results | Wrong label filters or metric not scraped | Check up{job="..."}, verify label values |
| Too many series | High cardinality | Add label filters, aggregate, use recording rules |
| Wrong values | Wrong function for metric type | rate() on counters; direct or *_over_time() on gauges |
| Slow queries | Large range vectors or missing filters | Narrow time range, add filters, use recording rules |
promql_functions.md — All PromQL functions with examples. Read for specific function questions.promql_patterns.md — RED/USE method patterns, alerting, recording rules. Read for standard monitoring patterns.best_practices.md — Anti-patterns, performance, cardinality. Read when optimizing.metric_types.md — Counter/Gauge/Histogram/Summary guide. Read to confirm function choice.common_queries.promql — Reusable request rate, error rate, latency, availability queries.red_method.promql — Complete RED method implementation.use_method.promql — Complete USE method implementation.slo_patterns.promql — SLO, error budget, burn rate, multi-window alerting.alerting_rules.yaml — Example alerting rules with thresholds.recording_rules.yaml — Example recording rules with naming conventions.kubernetes_patterns.promql — kube-state-metrics, cAdvisor, vector matching.Always read the relevant reference/asset file and cite the applicable pattern before generating a query.
Install with Tessl CLI
npx tessl i pantheon-ai/promql-generator