Use this skill when performing root cause analysis on incidents detected by Elastic Observability. Activate when the user reports a production issue, outage, degraded performance, or asks to investigate alerts.
54
59%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./packages/opencode/src/elastic/skills/observability-rca/SKILL.mdQuality
Discovery
54%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description excels at defining when to use the skill with rich, natural trigger terms, but it is notably weak on specifying what concrete actions the skill performs. It reads more like a trigger clause without a capability summary. Adding specific actions (e.g., querying Elasticsearch logs, analyzing APM traces, correlating metrics) would significantly improve it.
Suggestions
Add specific concrete actions the skill performs, e.g., 'Queries Elasticsearch logs, analyzes APM traces, correlates metrics across services, and inspects anomaly detection results to identify root causes.'
Mention Elastic-specific artifacts and tools (e.g., Kibana dashboards, APM traces, log indices, alerting rules) to improve both specificity and distinctiveness from generic incident analysis skills.
Use third-person voice ('Performs root cause analysis...') instead of the imperative 'Use this skill when...' framing to lead with capabilities before trigger conditions.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description mentions 'root cause analysis on incidents' but does not list any concrete actions (e.g., query logs, analyze traces, correlate metrics, inspect APM data). It stays at an abstract level without specifying what the skill actually does. | 1 / 3 |
Completeness | The 'when' is explicitly and thoroughly covered with clear trigger scenarios. However, the 'what' is weak — it only says 'performing root cause analysis' without detailing the specific actions or capabilities the skill provides. | 2 / 3 |
Trigger Term Quality | Includes strong natural trigger terms: 'production issue', 'outage', 'degraded performance', 'investigate alerts', 'root cause analysis', 'incidents', and 'Elastic Observability'. These are terms users would naturally use when seeking this kind of help. | 3 / 3 |
Distinctiveness Conflict Risk | The mention of 'Elastic Observability' provides some distinctiveness, but 'root cause analysis' and 'production issue' are broad enough to overlap with other incident management or monitoring skills. Without specific actions tied to Elastic's tooling, conflict risk remains moderate. | 2 / 3 |
Total | 8 / 12 Passed |
Implementation
64%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill provides strong, actionable ES|QL queries that form a solid investigation toolkit for Elastic Observability incidents. However, it includes generic knowledge Claude already possesses (common root causes, how to write postmortems) and lacks validation checkpoints critical for incident investigation workflows—such as verifying hypotheses before declaring root cause or decision trees for when initial queries don't yield results.
Suggestions
Remove or significantly trim the 'Common Root Causes' table and 'Resolution Documentation' section—Claude already knows these patterns and how to write incident reports.
Add validation checkpoints between investigation steps, e.g., 'If error rate query returns no results, broaden the time window or check index patterns' and 'Verify root cause hypothesis by correlating at least two independent signals before concluding.'
Add decision branching: what to do when the initial scope assessment shows no errors, or when traces are unavailable for a service.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Mostly efficient with concrete queries, but the 'Common Root Causes' table and 'Resolution Documentation' section explain things Claude already knows (how to write incident reports, common infrastructure failure modes). The symptom-cause table is generic knowledge that doesn't earn its tokens. | 2 / 3 |
Actionability | Provides fully executable ES|QL queries for each investigation step, with specific field names, aggregations, and filters. The queries are copy-paste ready with clear placeholders for variable substitution. | 3 / 3 |
Workflow Clarity | The 5-step investigation framework provides a clear sequence, but lacks validation checkpoints or feedback loops. There's no guidance on what to do if queries return no results, how to verify a hypothesis before declaring root cause, or when to escalate. For an investigation workflow involving production incidents, explicit decision points and verification steps are important. | 2 / 3 |
Progressive Disclosure | Content is reasonably structured with clear sections, but everything is inline in a single file. The common root causes table and resolution documentation template could be separated into reference files. For a skill with no bundle files, the content is borderline monolithic at this length. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
2e200ec
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.