monitoring-and-alerting

Design and run a monitoring system for a website or web app. Use this skill when setting up uptime checks, defining SLOs, configuring error tracking, choosing what to alert on, designing on-call rotations, or fixing alert fatigue. Triggers on monitoring, alerts, uptime, SLO, SLA, error rate, on-call, pager, alert fatigue, observability, dashboards, what should we monitor. Also triggers when an incident reveals a gap in monitoring.

Quality

100%

Does it follow best practices?

Run evals on this skill

Adds up to 20 points to the overall score

View guide

Securityby

Passed

No findings from the security scan

Quality

Content

100%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

A high-quality instruction skill: information-dense and concrete with specific thresholds, a clearly sequenced 8-step workflow with a built-in audit feedback loop, and clean one-level-deep reference structure. No dimension shows material weakness.

Dimension	Reasoning	Score
Conciseness	Dense, well-structured content — layered check lists, threshold tables, tier definitions, and a failure-pattern catalog — that assumes Claude's competence and explains no generic concepts (the SLO/error-budget framing is operational, not padding). The short rhetorical section openers aid navigation rather than inflate tokens, so it sits at the 'lean, every token earns its place' anchor rather than the 'mostly efficient but could be tightened' score-2.	3 / 3
Actionability	Instruction-only yet highly actionable: concrete thresholds ('more than 2 consecutive failed checks', 'p95 doubled in 5 minutes', 'Error rate above 1% for 5 minutes'), specific escalation timing ('5-15 minutes'), an SLO downtime table, a per-box check matrix, and a copy-paste-ready audit checklist. The scoring note permits absence of code for instruction-only skills when guidance is this specific, so it clears the score-3 bar over the 'incomplete/missing key details' score-2 anchor.	3 / 3
Workflow Clarity	An explicit 8-step sequence (Inventory → Map → Define SLOs → Configure checks → Tier → Route → Dashboards → Audit) with per-step sub-guidance and a built-in feedback loop in Step 8's quarterly alert audit. The cap-at-2 rule applies only to destructive/batch operations, which this design skill is not, so it reaches the 'clear sequence with feedback loops' anchor.	3 / 3
Progressive Disclosure	The body is a well-organized overview (4 layers, SLOs, workflow, failure patterns, output format) with a single well-signaled one-level-deep reference to references/slo-design-guide.md — a real file verified to exist — for deep SLO design. Content is appropriately split and navigation is easy, matching the score-3 anchor rather than the 'content that should be separate is inline' score-2.	3 / 3
	Total	12 / 12 Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

A strong, comprehensive description that explicitly states capabilities, gives explicit trigger guidance with natural keywords, and occupies a clear niche. Third-person voice and concrete action list satisfy every dimension.

Dimension	Reasoning	Score
Specificity	Lists multiple concrete actions — 'setting up uptime checks, defining SLOs, configuring error tracking, choosing what to alert on, designing on-call rotations, or fixing alert fatigue' — far beyond the score-2 'names domain and some actions' anchor. Voice is third person ('Design and run'), so no specificity penalty.	3 / 3
Completeness	Answers both 'what' ('Design and run a monitoring system for a website or web app') and 'when' with an explicit 'Use this skill when...' clause plus a 'Triggers on...' list. Matches the score-3 anchor requiring explicit triggers for both halves.	3 / 3
Trigger Term Quality	Explicit 'Triggers on monitoring, alerts, uptime, SLO, SLA, error rate, on-call, pager, alert fatigue, observability, dashboards, what should we monitor' gives broad coverage of terms a user would naturally say. Not score 2, which expects missing common variations — the list is comprehensive and includes colloquial phrasings like 'what should we monitor'.	3 / 3
Distinctiveness Conflict Risk	Clear monitoring/alerting niche with distinct triggers (uptime, SLO, on-call, pager, alert fatigue) unlikely to fire for unrelated skills. A couple of broad terms (observability, dashboards) carry mild overlap risk, but the core niche is well-defined enough to stay at 3 rather than the 'could still overlap' score-2 anchor.	3 / 3
	Total	12 / 12 Passed

Validation

93%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 15 / 16 Passed

Validation for skill structure

Criteria	Description	Result
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	15 / 16 Passed

Repository: rampstackco/claude-skills
Path: skills/monitoring-and-alerting/SKILL.md
Commit: bc6d961

Reviewed: about 7 hours ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.