Build production-ready monitoring, logging, and tracing systems. Implements comprehensive observability strategies, SLI/SLO management, and incident response workflows. Use PROACTIVELY for monitoring infrastructure, performance optimization, or production reliability.
46
46%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
67%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description covers the core domain well and includes an explicit 'Use when' clause, which is a strength. However, the capabilities listed are somewhat high-level and could benefit from more concrete actions, and the trigger terms miss common user vocabulary like 'alerts', 'dashboards', 'metrics', or specific tool names. The 'performance optimization' trigger is broad enough to risk overlap with other skills.
Suggestions
Add more concrete actions such as 'configure alert rules, build dashboards, define SLO targets, create incident runbooks' to improve specificity.
Include additional natural trigger terms users would say, such as 'alerts', 'dashboards', 'metrics', 'Prometheus', 'Grafana', 'uptime', 'on-call', or 'APM'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (monitoring, logging, tracing) and some actions (SLI/SLO management, incident response workflows), but the actions are described at a high level rather than listing multiple concrete operations like 'configure alerting rules, set up dashboards, define SLOs, create runbooks'. | 2 / 3 |
Completeness | Clearly answers both 'what' (build monitoring/logging/tracing systems, SLI/SLO management, incident response) and 'when' with an explicit trigger clause ('Use PROACTIVELY for monitoring infrastructure, performance optimization, or production reliability'). | 3 / 3 |
Trigger Term Quality | Includes relevant terms like 'monitoring', 'logging', 'tracing', 'observability', 'SLI/SLO', 'incident response', and 'production reliability', but misses common user variations like 'alerts', 'dashboards', 'metrics', 'Prometheus', 'Grafana', 'on-call', or 'uptime'. | 2 / 3 |
Distinctiveness Conflict Risk | The observability/monitoring niche is fairly distinct, but terms like 'performance optimization' and 'production reliability' are broad enough to potentially overlap with general DevOps, infrastructure, or performance tuning skills. | 2 / 3 |
Total | 9 / 12 Passed |
Implementation
0%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill reads as a persona description or job posting rather than actionable instructions for performing observability tasks. It consists almost entirely of bullet-point lists of tools, concepts, and capabilities that Claude already knows, with no executable code, concrete configurations, specific commands, or practical workflows. The four-line 'Instructions' section is far too vague to guide any real implementation work.
Suggestions
Replace the massive capability lists with concrete, executable examples: a sample Prometheus config, a PromQL query for SLO burn rate, an OpenTelemetry instrumentation snippet, or a Grafana dashboard JSON template.
Expand the 'Instructions' section into a detailed workflow with explicit validation checkpoints, e.g., 'After defining SLIs, validate with: check that each SLI has a measurable metric, a collection method, and a defined good/bad threshold.'
Move detailed tool-specific guidance into separate referenced files (e.g., PROMETHEUS.md, OTEL.md, ALERTING.md) and keep SKILL.md as a concise overview with clear navigation links.
Remove the 'Capabilities', 'Behavioral Traits', 'Knowledge Base', and 'Example Interactions' sections entirely—these describe what Claude already knows and waste context window tokens without adding actionable value.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose with massive capability lists, behavioral traits, knowledge base sections, and example interactions that Claude already knows. The bulk of the content is a resume-style enumeration of tools and concepts rather than actionable instructions. Most of this could be eliminated without losing any practical value. | 1 / 3 |
Actionability | No concrete code, commands, configuration examples, or executable guidance anywhere. The entire skill is abstract descriptions and bullet-point lists of capabilities. The 'Instructions' section is four vague steps with no specifics on how to accomplish any of them. | 1 / 3 |
Workflow Clarity | The 'Instructions' section has 4 high-level steps with no validation checkpoints, no error recovery, and no concrete sequencing. The 'Response Approach' section is similarly vague. For a skill involving production monitoring and potentially destructive operations, the complete absence of validation steps is a critical gap. | 1 / 3 |
Progressive Disclosure | Monolithic wall of text with no references to external files. Hundreds of lines of capability lists are inlined that could be split into focused reference documents. No navigation structure or links to deeper resources for specific topics like SLO management or tracing setup. | 1 / 3 |
Total | 4 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
metadata_version | 'metadata.version' is missing | Warning |
Total | 10 / 11 Passed | |
Reviewed
Table of Contents