langchain-incident-runbook

Incident response procedures for LangChain production issues: provider outages, high error rates, latency spikes, and cost overruns. Trigger: "langchain incident", "langchain outage", "langchain production issue", "langchain emergency", "langchain down", "LLM provider outage".

Quality

76%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./plugins/saas-packs/langchain-pack/skills/langchain-incident-runbook/SKILL.md

Quality

Discovery

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This description has strong trigger terms and completeness, clearly identifying when it should be used with explicit trigger keywords. Its main weakness is that it describes the categories of problems it handles but not the concrete actions it performs (e.g., diagnosing, escalating, switching providers). The distinctiveness is excellent due to the narrow LangChain + incident response niche.

Suggestions

Add specific concrete actions the skill performs, e.g., 'Diagnoses root causes, initiates provider failover, generates incident reports, and recommends cost mitigation strategies' to improve specificity.

Dimension	Reasoning	Score
Specificity	Names the domain (LangChain production issues) and lists categories of issues (provider outages, high error rates, latency spikes, cost overruns), but doesn't describe concrete actions the skill performs (e.g., 'diagnoses root cause', 'switches to fallback provider', 'generates runbook steps').	2 / 3
Completeness	Clearly answers both 'what' (incident response procedures for LangChain production issues covering outages, errors, latency, costs) and 'when' (explicit trigger terms listed with a 'Trigger:' clause). The when guidance is explicit and actionable.	3 / 3
Trigger Term Quality	Includes a strong set of natural trigger terms users would actually say: 'langchain incident', 'langchain outage', 'langchain down', 'langchain production issue', 'langchain emergency', and 'LLM provider outage'. These cover common variations well.	3 / 3
Distinctiveness Conflict Risk	Highly distinctive — the combination of 'LangChain', 'incident response', and 'production issues' creates a clear niche. The specific trigger terms like 'langchain outage' and 'langchain down' are unlikely to conflict with other skills.	3 / 3
	Total	11 / 12 Passed

Implementation

62%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured incident runbook with clear severity classification and a consistent Detect/Diagnose/Mitigate/Recover pattern across five runbooks. Its main weaknesses are incomplete code examples (referencing undefined classes like BudgetEnforcer and MetricsCallback) and being somewhat long for a single file without supporting bundle documents. The incident response checklist is a strong addition that provides explicit validation steps.

Suggestions

Make code examples fully executable by either providing implementations for BudgetEnforcer and MetricsCallback, or replacing them with real LangChain/LangSmith APIs that exist

Split detailed runbooks into separate files (e.g., runbook-provider-outage.md) and keep SKILL.md as a concise index with severity classification and the checklist

Remove explanatory comments that Claude already knows (e.g., '// gpt-4o ($2.50/1M) -> gpt-4o-mini ($0.15/1M) = 17x cheaper') and replace with just the actionable configuration

Dimension	Reasoning	Score
Conciseness	The skill is reasonably efficient but includes some unnecessary elements — comments like '// All chains using resilientModel auto-failover' and the ERROR_CAUSES lookup table explain things Claude already knows. The severity classification table and checklists add value but some sections (like Runbook 4 and 5) are thin enough to be consolidated.	2 / 3
Actionability	Many code blocks are concrete and executable (provider diagnosis, fallback setup), but several are incomplete or pseudocode-like — BudgetEnforcer and MetricsCallback are referenced without imports or definitions, the caching example uses a naive Map without real implementation, and Runbook 3's diagnose step references a non-existent MetricsCallback class. Some mitigations are just comments rather than executable code.	2 / 3
Workflow Clarity	Each runbook follows a clear Detect → Diagnose → Mitigate → Recover sequence. The incident response checklist provides explicit validation checkpoints (verify full recovery, schedule post-mortem). The severity classification table clearly maps to response times. The structure naturally guides through multi-step incident response with appropriate sequencing.	3 / 3
Progressive Disclosure	The content is well-structured with clear sections and a logical hierarchy, but it's a fairly long monolithic file (~200 lines) with no bundle files to offload detail into. References to external resources (LangSmith, status pages) and the langchain-debug-bundle skill are present but the detailed runbooks could benefit from being split into separate files with the SKILL.md serving as an overview/index.	2 / 3
	Total	9 / 12 Passed

Validation

81%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 9 / 11 Passed

Validation for skill structure

Criteria	Description	Result
allowed_tools_field	'allowed-tools' contains unusual tool name(s)	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	9 / 11 Passed

Repository: jeremylongshore/claude-code-plugins-plus-skills
Commit: a04d1a2

Reviewed: 3 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.