CtrlK
BlogDocsLog inGet started
Tessl Logo

monitoring-and-alerting

Design and run a monitoring system for a website or web app. Use this skill when setting up uptime checks, defining SLOs, configuring error tracking, choosing what to alert on, designing on-call rotations, or fixing alert fatigue. Triggers on monitoring, alerts, uptime, SLO, SLA, error rate, on-call, pager, alert fatigue, observability, dashboards, what should we monitor. Also triggers when an incident reveals a gap in monitoring.

67

Quality

81%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that clearly defines what the skill does, when to use it, and includes a comprehensive set of natural trigger terms. It uses third-person voice, lists concrete actions, and provides both standard keyword triggers and a situational trigger for edge cases. It serves as a strong example of a well-crafted skill description.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: setting up uptime checks, defining SLOs, configuring error tracking, choosing what to alert on, designing on-call rotations, fixing alert fatigue. These are all distinct, actionable capabilities.

3 / 3

Completeness

Clearly answers both 'what' (design and run a monitoring system for a website or web app) and 'when' (explicit 'Use this skill when...' clause plus a 'Triggers on' list and an additional situational trigger). Both are thorough and explicit.

3 / 3

Trigger Term Quality

Excellent coverage of natural terms users would say: monitoring, alerts, uptime, SLO, SLA, error rate, on-call, pager, alert fatigue, observability, dashboards, and even the conversational phrase 'what should we monitor'. Also includes the situational trigger 'when an incident reveals a gap in monitoring'.

3 / 3

Distinctiveness Conflict Risk

The description carves out a clear niche around website/web app monitoring, SLOs, alerting, and on-call practices. Terms like 'uptime checks', 'on-call rotations', 'alert fatigue', and 'SLO' are highly specific to this domain and unlikely to conflict with other skills.

3 / 3

Total

12

/

12

Passed

Implementation

62%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-organized strategic/operational skill that provides a comprehensive monitoring framework with clear workflow steps and good decision criteria. Its main weakness is the lack of concrete, executable artifacts—no example configs, no code snippets, no template files—which limits actionability for a skill that should help Claude actually set up monitoring. The content is also somewhat long for what could be more tightly expressed, though the structure and logical flow are strong.

Suggestions

Add concrete, executable examples: a sample uptime check config (e.g., for a common tool like UptimeRobot, Datadog, or a simple curl-based script), a sample PagerDuty routing rule, or a template monitoring plan in a structured format.

Extract the failure patterns section and detailed SLO guidance into separate reference files to reduce the main skill's length and improve progressive disclosure.

Include a sample runbook template or example runbook for a common paging alert (e.g., 'site down' or 'error rate spike') to make the 'every alert needs a runbook' guidance actionable.

DimensionReasoningScore

Conciseness

The content is generally well-structured and avoids explaining basic concepts Claude would know, but it's quite lengthy (~300 lines) with some sections that could be tightened. The failure patterns section, while useful, includes some obvious advice (e.g., 'alert without a runbook' explanation). The SLO table and framework layers are efficient, but the workflow steps contain some filler prose like 'Many teams have a tangle of half-configured tools. The first job is the inventory.'

2 / 3

Actionability

The skill provides a solid conceptual framework with specific thresholds, tiering guidance, and an SLO table, but lacks concrete executable examples—no code snippets, no specific tool configurations, no example monitoring configs, no sample runbook templates. Guidance like 'HTTP checks from multiple regions' and 'synthetic checks' is directional but not copy-paste ready. The monitoring plan table in Step 4 is a good structural example but remains abstract.

2 / 3

Workflow Clarity

The 8-step workflow is clearly sequenced and logically ordered from inventory through audit. It includes validation checkpoints (Step 8 quarterly audit, Step 5's explicit tiering criteria, escalation paths in Step 6). The feedback loop between SLOs and velocity is explicitly called out. The three-tier alert system provides clear decision criteria for categorization.

3 / 3

Progressive Disclosure

The skill references one external file (references/slo-design-guide.md) which is appropriate, but the bundle shows no files were actually provided, so this reference is unverifiable. The SKILL.md itself is quite long and monolithic—the failure patterns section, detailed SLO guide, and dashboard guidance could reasonably be split into separate reference files. The 'When NOT to use' cross-references to other skills are well done.

2 / 3

Total

9

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Repository
rampstackco/claude-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.