CtrlK
BlogDocsLog inGet started
Tessl Logo

langfuse-incident-runbook

Troubleshoot and respond to Langfuse-related incidents and outages. Use when experiencing Langfuse outages, debugging production issues, or responding to LLM observability incidents. Trigger with phrases like "langfuse incident", "langfuse outage", "langfuse down", "langfuse production issue", "langfuse troubleshoot".

85

Quality

83%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Advisory

Suggest reviewing before use

SKILL.md
Quality
Evals
Security

Quality

Discovery

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a well-structured skill description with strong completeness and distinctiveness. It includes explicit 'Use when' and 'Trigger with' clauses with natural language terms, making it easy for Claude to select appropriately. The main weakness is that the specific actions (troubleshoot, respond, debug) could be more granular to better convey the concrete capabilities of the skill.

Suggestions

Add more specific concrete actions beyond 'troubleshoot and respond', such as 'check service health endpoints, analyze error logs, identify root causes, draft incident reports, and escalate issues'.

DimensionReasoningScore

Specificity

The description names the domain (Langfuse incidents/outages) and some actions (troubleshoot, respond, debug), but doesn't list multiple specific concrete actions like 'check service health', 'review error logs', 'restart services', or 'escalate to on-call'.

2 / 3

Completeness

Clearly answers both 'what' (troubleshoot and respond to Langfuse-related incidents and outages) and 'when' (explicit 'Use when' clause with scenarios, plus explicit 'Trigger with phrases' listing specific trigger terms).

3 / 3

Trigger Term Quality

Excellent coverage of natural trigger terms including 'langfuse incident', 'langfuse outage', 'langfuse down', 'langfuse production issue', 'langfuse troubleshoot', plus broader terms like 'LLM observability incidents' and 'debugging production issues'.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive due to the specific product name 'Langfuse' and the narrow focus on incident response/outages for that particular tool. Very unlikely to conflict with other skills.

3 / 3

Total

11

/

12

Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a strong incident runbook with excellent actionability — executable scripts, concrete code examples, and clear decision tables make it immediately useful during an outage. The workflow is well-sequenced with proper validation steps and severity-based routing. The main weakness is that it's somewhat long for a single SKILL.md file and has some content redundancy between the symptom table, resolution procedures, and error handling table.

Suggestions

Remove or consolidate the final 'Error Handling' table, which largely duplicates the symptom/action mapping in Step 2 and the detailed procedures in Step 4.

Consider extracting the detailed resolution procedures (A, B, C) into a separate PROCEDURES.md file, keeping only the symptom-to-procedure mapping in the main skill.

DimensionReasoningScore

Conciseness

Generally efficient with good use of tables for quick reference, but some redundancy exists (e.g., the Error Handling table at the end largely duplicates information from Step 2's symptom table and Step 4's procedures). The severity classification table and escalation path are useful additions that earn their tokens, but the overall document could be tightened.

2 / 3

Actionability

Provides fully executable bash scripts for triage and verification, concrete TypeScript code for fallback mode and resolution procedures, and specific docker commands for self-hosted troubleshooting. Commands are copy-paste ready with proper error handling (set -euo pipefail) and environment variable defaults.

3 / 3

Workflow Clarity

Clear 6-step sequence from initial assessment through post-incident review, with explicit time targets (2 min triage), validation checkpoints (Step 5 post-incident verification), and a decision matrix (Step 2) for routing to the correct resolution procedure. The feedback loop of verify → fix → re-verify is present in the verification script.

3 / 3

Progressive Disclosure

The content is well-structured with clear headers and tables, but it's a fairly long monolithic document (~150 lines of substantive content). The resolution procedures (A, B, C) and the self-hosted troubleshooting could be split into separate referenced files. External links to Langfuse docs are provided but no internal file references for deeper content.

2 / 3

Total

10

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation9 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

allowed_tools_field

'allowed-tools' contains unusual tool name(s)

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

9

/

11

Passed

Repository
jeremylongshore/claude-code-plugins-plus-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.