Define and implement Service Level Indicators (SLIs) and Service Level Objectives (SLOs) with error budgets and alerting. Use when establishing reliability targets, implementing SRE practices, or measuring service performance.
84
77%
Does it follow best practices?
Impact
92%
1.55xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./plugins/observability-monitoring/skills/slo-implementation/SKILL.mdQuality
Discovery
89%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a solid description that clearly identifies its niche in SRE reliability engineering with good trigger terms and an explicit 'Use when' clause. Its main weakness is that the capability actions are somewhat high-level—'define and implement' could be more specific about concrete deliverables like configuring burn-rate alerts, creating SLO dashboards, or writing monitoring-as-code definitions.
Suggestions
Expand the concrete actions beyond 'define and implement' to list specific deliverables, e.g., 'configure burn-rate alerts, create SLO dashboards, calculate error budgets, write monitoring-as-code definitions'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (SLIs, SLOs, error budgets, alerting) and some actions ('define and implement'), but doesn't list multiple concrete actions in detail—e.g., it doesn't specify what 'implement' entails (dashboards, monitoring configs, burn-rate alerts, etc.). | 2 / 3 |
Completeness | Clearly answers both 'what' (define and implement SLIs/SLOs with error budgets and alerting) and 'when' (explicit 'Use when' clause covering reliability targets, SRE practices, and measuring service performance). | 3 / 3 |
Trigger Term Quality | Includes strong natural keywords users would say: 'SLIs', 'SLOs', 'error budgets', 'alerting', 'reliability targets', 'SRE practices', 'service performance'. These cover the main terms a user would naturally use when requesting this kind of work. | 3 / 3 |
Distinctiveness Conflict Risk | The SLI/SLO/error budget domain is a clear niche within SRE. The specific terminology (SLIs, SLOs, error budgets) makes it highly unlikely to conflict with general monitoring, alerting, or performance skills. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
64%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill provides strong, actionable Prometheus/Grafana implementation examples for SLO monitoring, with executable PromQL and YAML configurations that are genuinely useful. However, it suffers from verbosity through duplicated alert rules, generic best practices Claude already knows, and process sections (weekly/monthly/quarterly reviews) that add little value. The workflow could be improved with explicit implementation steps and validation checkpoints.
Suggestions
Remove the generic 'Best Practices' list and 'SLO Review Process' section — these are standard SRE knowledge Claude already has, and they consume significant tokens without adding actionable value.
Add an explicit implementation workflow with validation steps, e.g., '1. Define SLIs → 2. Create recording rules → 3. Verify metrics are populating → 4. Add alerting rules → 5. Validate alerts fire correctly in test'.
Consolidate the duplicated multi-window burn rate alerts section with the SLO Alerting Rules section to reduce redundancy.
Move the full Prometheus recording rules and alerting rules YAML into a referenced file (e.g., `references/prometheus-slo-rules.yml`) and keep only a concise example in the main skill.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill contains some unnecessary explanatory content (e.g., the SLI/SLO/SLA hierarchy diagram, the 'When to Use' list, the 'Consider' list for choosing SLOs, and the generic best practices list). The availability table is useful but the review process section and best practices are largely things Claude already knows. However, the core Prometheus/YAML examples are dense and useful. | 2 / 3 |
Actionability | The skill provides fully executable PromQL queries, complete Prometheus recording rules, alerting rules in proper YAML format, and concrete error budget calculations. The code examples are copy-paste ready and cover the full implementation chain from SLI definition through alerting. | 3 / 3 |
Workflow Clarity | While the content covers the logical progression from SLI definition → SLO targets → error budgets → recording rules → alerting, it reads more like a reference document than a clear workflow. There are no explicit validation checkpoints (e.g., 'verify your recording rules are producing data before creating alerts') or feedback loops for error recovery during implementation. | 2 / 3 |
Progressive Disclosure | There are references to `references/slo-definitions.md` and `references/error-budget.md`, and related skills are linked at the bottom. However, the main document is quite long (~200+ lines) with substantial inline content that could be split out (e.g., the full alerting rules, dashboard queries). The multi-window burn rate section largely duplicates the alerting rules section. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
27a7ed9
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.