Troubleshoot and respond to Langfuse-related incidents and outages. Use when experiencing Langfuse outages, debugging production issues, or responding to LLM observability incidents. Trigger with phrases like "langfuse incident", "langfuse outage", "langfuse down", "langfuse production issue", "langfuse troubleshoot".
68
83%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
89%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a well-structured skill description with strong completeness and distinctiveness. It includes explicit 'Use when' and 'Trigger with' clauses with natural language terms, and the Langfuse-specific focus makes it highly distinguishable. The main weakness is that the capability actions could be more specific—listing concrete troubleshooting steps rather than just 'troubleshoot and respond'.
Suggestions
Add more specific concrete actions such as 'check service health endpoints, review error logs, diagnose trace ingestion failures, verify API key configuration' to improve specificity.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names the domain (Langfuse incidents/outages) and some actions (troubleshoot, respond, debug), but doesn't list multiple specific concrete actions like 'check service health', 'review error logs', 'restart services', or 'escalate to on-call'. | 2 / 3 |
Completeness | Clearly answers both 'what' (troubleshoot and respond to Langfuse-related incidents and outages) and 'when' (explicit 'Use when' clause with scenarios, plus explicit 'Trigger with phrases' listing specific trigger terms). | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural trigger terms including 'langfuse incident', 'langfuse outage', 'langfuse down', 'langfuse production issue', 'langfuse troubleshoot', plus broader terms like 'LLM observability incidents' and 'debugging production issues'. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive due to the specific product name 'Langfuse' and the narrow focus on incident response/outages for that particular tool. Very unlikely to conflict with other skills. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a strong incident runbook with highly actionable, executable code for each step of the incident response process. The workflow is clearly sequenced with proper validation checkpoints and decision tables. The main weaknesses are some content redundancy between the symptom table, resolution procedures, and error handling table, and the lack of progressive disclosure into separate files for what is a substantial document.
Suggestions
Remove or consolidate the final 'Error Handling' table, which largely duplicates the symptom/action mapping in Step 2 and the detailed procedures in Step 4.
Consider splitting detailed resolution procedures (A, B, C) into a separate PROCEDURES.md file, keeping only the symptom-to-procedure mapping in the main SKILL.md.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Generally efficient with good use of tables and code blocks, but some redundancy exists (e.g., the Error Handling table at the end largely duplicates information from Step 2's symptom table and Step 4's procedures). The severity classification table and escalation path are useful but could be slightly tighter. | 2 / 3 |
Actionability | Highly actionable with executable bash scripts for triage and verification, concrete TypeScript code for fallback mode and recovery procedures, and specific commands for self-hosted debugging. Code is copy-paste ready with proper error handling (set -euo pipefail) and environment variable defaults. | 3 / 3 |
Workflow Clarity | Clear 6-step sequence from initial assessment through post-incident review, with explicit validation at Step 5 (checking trace counts and providing pass/fail feedback). The symptom-to-action mapping table in Step 2 serves as an effective decision tree, and the severity classification drives appropriate response timing. | 3 / 3 |
Progressive Disclosure | Content is well-structured with clear headers and logical sections, but it's a fairly long monolithic document (~180 lines of content) with no bundle files to offload detailed procedures. The common resolution procedures (A, B, C) and the error handling table could be split into separate reference files for better navigation. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 9 / 11 Passed | |
23fe3bf
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.