CtrlK
BlogDocsLog inGet started
Tessl Logo

databricks-incident-runbook

Execute Databricks incident response procedures with triage, mitigation, and postmortem. Use when responding to Databricks-related outages, investigating job failures, or running post-incident reviews for pipeline failures. Trigger with phrases like "databricks incident", "databricks outage", "databricks down", "databricks on-call", "databricks emergency", "job failed".

89

Quality

88%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description that clearly defines its scope (Databricks incident response), lists concrete actions (triage, mitigation, postmortem), provides explicit 'Use when' guidance, and includes a comprehensive set of natural trigger phrases. It uses proper third-person voice throughout and is concise without being vague. Minor improvement could include mentioning specific artifact types like runbooks or alerting systems, but overall this is well-crafted.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: 'triage', 'mitigation', 'postmortem', 'investigating job failures', 'running post-incident reviews'. These are clear, actionable procedures rather than vague language.

3 / 3

Completeness

Clearly answers both 'what' (execute incident response procedures with triage, mitigation, and postmortem) and 'when' (responding to outages, investigating job failures, running post-incident reviews) with an explicit 'Use when' clause and additional trigger phrases.

3 / 3

Trigger Term Quality

Excellent coverage of natural trigger terms users would actually say: 'databricks incident', 'databricks outage', 'databricks down', 'databricks on-call', 'databricks emergency', 'job failed'. These cover multiple natural phrasings a user in an incident scenario would use.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive with a clear niche: Databricks-specific incident response. The combination of 'Databricks' + 'incident response' creates a very specific domain unlikely to conflict with general Databricks skills or general incident response skills.

3 / 3

Total

12

/

12

Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a strong, highly actionable incident runbook with excellent workflow clarity and concrete, executable commands at every step. Its main weakness is that it packs too much into a single file—communication templates, postmortem templates, and detailed remediation steps would benefit from being split into referenced files to improve token efficiency and progressive disclosure. The decision tree and error handling table are particularly well done.

Suggestions

Extract the postmortem template and communication templates into separate referenced files (e.g., POSTMORTEM_TEMPLATE.md, COMMS_TEMPLATES.md) to reduce the main file's token footprint.

Move detailed remediation steps (3a-3d) into a separate REMEDIATION.md file, keeping only the decision tree and brief summaries in the main SKILL.md.

DimensionReasoningScore

Conciseness

The skill is fairly efficient and avoids explaining basic concepts, but it's quite long (~200 lines) with some sections that could be tightened—the communication templates and postmortem template add bulk that could be in separate files. The severity table and decision tree are useful but the overall document is heavy for a single SKILL.md.

2 / 3

Actionability

Excellent actionability throughout—every step has executable bash commands, SQL queries, or copy-paste-ready templates. The triage script, cluster diagnostics, run repair commands, and evidence collection script are all concrete and immediately usable.

3 / 3

Workflow Clarity

The workflow is clearly sequenced (triage → decision tree → specific remediation → communication → evidence → postmortem) with explicit validation checkpoints. The decision tree provides clear branching logic, and the error handling table covers common failure modes with specific recovery steps.

3 / 3

Progressive Disclosure

The content is well-structured with clear headers and logical sections, but it's monolithic—the communication templates, postmortem template, and detailed remediation steps could be split into separate referenced files. The single reference to 'databricks-data-handling' at the end is good but insufficient given the document's length.

2 / 3

Total

10

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation9 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

allowed_tools_field

'allowed-tools' contains unusual tool name(s)

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

9

/

11

Passed

Repository
jeremylongshore/claude-code-plugins-plus-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.