Complete PromQL toolkit with generation and validation capabilities
94
94%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
This skill uses a two-phase interactive workflow: Phase 1 (Steps 1-4) presents validation results and asks clarifying questions, then stops and waits for user response before proceeding to Phase 2 (Steps 5-7) for tailored recommendations.
When a user provides a PromQL query, follow this workflow:
Run the syntax validation script to check for basic correctness:
python3 .claude/skills/promql-validator/scripts/validate_syntax.py "<query>"Run the best practices checker to detect anti-patterns and optimization opportunities:
python3 .claude/skills/promql-validator/scripts/check_best_practices.py "<query>"Parse and explain what the query does in plain English:
Ask the user clarifying questions to verify the query matches their intent:
⏸️ STOP HERE AND WAIT FOR USER RESPONSE before proceeding to Steps 5-7.
After understanding the user's intent:
When relevant, mention known limitations:
_bytes suffix. Please confirm if this is correct.")Based on validation results:
Reference Examples: When suggesting corrections, cite relevant examples using this format:
As shown in `examples/bad_queries.promql` (lines 91-97):
❌ BAD: `avg(http_request_duration_seconds{quantile="0.95"})`
✅ GOOD: Use histogram_quantile() with histogram bucketsCitation sources:
assets/good_queries.promql - for well-formed patternsassets/optimization_examples.promql - for before/after comparisonsassets/bad_queries.promql - for showing what to avoidreferences/best_practices.md - for detailed explanationsreferences/anti_patterns.md - for anti-pattern deep divesCitation Format: file_path (lines X-Y) with the relevant code snippet quoted
Give the user control:
Claude: "I've validated your query. It's syntactically correct, but I notice it queries http_requests_total without any label filters. This could match thousands of time series. What specific service or endpoint are you trying to monitor?"
User: [provides intent]
Claude: "Great! Based on that, here's an optimized version: rate(http_requests_total{job="api-service", path="/users"}[5m]). This calculates the per-second rate of requests to the /users endpoint over the last 5 minutes. Does this match what you need?"
User: [confirms or asks for changes]
Claude: [provides refined query or alternatives]
validate_syntax.py returns successjob=, namespace=, or service= label filters before mergingrate() on a gauge or avg() on a histogram quantile produces results that are syntactically valid but statistically wrong. Metric type must be confirmed before the query pattern is evaluated.for clause check on alert expressionsfor clause fires on a single evaluation, causing alert storms from transient spikes. The validator must flag missing for on all alerting rules.expr: error_rate > 0.05 as complete for an alerting rulefor and suggest a minimum for: 2m to reduce false positivesMetric types are inferred from naming conventions (e.g., _total, _bytes). Non-standard names may be misclassified — ask the user to confirm when uncertain.
The scripts flag metrics without label selectors, but recording rule metrics (e.g., job:http_requests:rate5m) and low-cardinality cases are legitimate without filters. Users can safely ignore the warning when they know their cardinality is manageable.
Scripts cannot verify whether metrics exist or whether label values are valid in any specific Prometheus deployment. For production use, test queries against an actual Prometheus instance.
The scripts detect common anti-patterns but cannot catch business logic errors, context-specific optimizations (e.g., based on scrape interval or retention), or custom function behavior from extensions.
The skill uses two main Python scripts:
Both scripts output JSON for programmatic parsing and human-readable messages for display.