CtrlK
BlogDocsLog inGet started
Tessl Logo

databricks-incident-runbook

Execute Databricks incident response procedures with triage, mitigation, and postmortem. Use when responding to Databricks-related outages, investigating job failures, or running post-incident reviews for pipeline failures. Trigger with phrases like "databricks incident", "databricks outage", "databricks down", "databricks on-call", "databricks emergency", "job failed".

89

Quality

88%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description that clearly defines its scope (Databricks incident response), lists concrete actions (triage, mitigation, postmortem), and provides explicit trigger guidance with natural user phrases. It uses proper third-person voice and is concise without being vague. The description would effectively differentiate this skill from others in a large skill library.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: 'triage', 'mitigation', and 'postmortem'. Also mentions investigating job failures and running post-incident reviews, giving a clear picture of what the skill does.

3 / 3

Completeness

Clearly answers both 'what' (execute incident response procedures with triage, mitigation, and postmortem) and 'when' (explicit 'Use when' clause covering outages, job failures, and post-incident reviews, plus a 'Trigger with phrases' section).

3 / 3

Trigger Term Quality

Excellent coverage of natural trigger terms users would say: 'databricks incident', 'databricks outage', 'databricks down', 'databricks on-call', 'databricks emergency', 'job failed'. These are realistic phrases a user would type during an incident.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive with a clear niche: Databricks-specific incident response. The combination of 'Databricks' + 'incident response' creates a very specific domain unlikely to conflict with general monitoring, alerting, or other platform skills.

3 / 3

Total

12

/

12

Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a strong, highly actionable incident runbook with excellent workflow clarity and executable commands throughout. Its main weakness is length — at 200+ lines with communication templates, postmortem templates, and multiple remediation paths all inline, it would benefit from splitting detailed sub-procedures into referenced files. The decision tree and step-by-step structure are exemplary for incident response.

Suggestions

Split communication templates, postmortem template, and evidence collection script into separate referenced files (e.g., COMMS_TEMPLATES.md, POSTMORTEM.md) to reduce the main skill's token footprint.

Remove the overview paragraph — the title and structure already convey the purpose, and the YAML description covers the trigger context.

DimensionReasoningScore

Conciseness

The skill is fairly efficient and avoids explaining basic concepts, but it's quite long (~200+ lines) with some sections that could be tightened. The severity level table and some communication templates add bulk, though most content earns its place. The overview paragraph is somewhat redundant given the structure speaks for itself.

2 / 3

Actionability

Excellent actionability throughout — every step has executable bash commands, SQL queries, or copy-paste templates. The triage script, cluster diagnostics, run repair commands, and evidence collection script are all concrete and immediately usable.

3 / 3

Workflow Clarity

The workflow is clearly sequenced (triage → decision tree → specific remediation → communication → evidence → postmortem) with explicit validation at each step. The decision tree provides clear branching logic, and the triage script serves as an initial validation checkpoint. Error handling table covers recovery scenarios.

3 / 3

Progressive Disclosure

The content is well-structured with clear headers and logical sections, but it's entirely monolithic — the detailed remediation steps, communication templates, and postmortem template could be split into separate referenced files. For a skill this long, inline content for every scenario makes it heavy. The single 'Next Steps' reference to databricks-data-handling is good but insufficient.

2 / 3

Total

10

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation9 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

allowed_tools_field

'allowed-tools' contains unusual tool name(s)

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

9

/

11

Passed

Repository
jeremylongshore/claude-code-plugins-plus-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.