CtrlK
BlogDocsLog inGet started
Tessl Logo

tdg-personal/canary-watch

Use this skill to monitor a deployed URL for regressions after deploys, merges, or dependency upgrades.

44

Quality

44%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

Quality

Discovery

40%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description provides reasonable 'when' context (after deploys, merges, dependency upgrades) but is weak on specificity — it doesn't explain what kind of monitoring or regression checks are performed. The second-person 'Use this skill to' phrasing is acceptable per the rubric examples, but the lack of concrete actions and missing common trigger term variations limit its effectiveness for skill selection.

Suggestions

Add specific concrete actions the skill performs, e.g., 'Runs HTTP health checks, captures screenshots for visual diff comparison, and validates response codes against a deployed URL'.

Expand trigger terms to include natural variations like 'smoke test', 'post-deploy check', 'regression testing', 'site health check', or 'production verification'.

Clarify the mechanism of monitoring to distinguish this from general testing or CI/CD skills — e.g., is it visual regression, functional testing, uptime monitoring, or performance benchmarking?

DimensionReasoningScore

Specificity

The description says 'monitor a deployed URL for regressions' but doesn't specify what concrete actions are performed — e.g., does it run visual diff checks, HTTP status checks, performance benchmarks, or something else? 'Monitor for regressions' is vague.

1 / 3

Completeness

It addresses 'when' fairly well ('after deploys, merges, or dependency upgrades') but the 'what' is weak — 'monitor a deployed URL for regressions' doesn't explain what specific actions or checks are performed. The 'Use this skill to...' phrasing serves as a trigger clause but the what-it-does portion lacks detail.

2 / 3

Trigger Term Quality

Includes some relevant terms like 'deployed URL', 'regressions', 'deploys', 'merges', 'dependency upgrades' that users might naturally mention. However, it misses common variations like 'smoke test', 'post-deploy check', 'site monitoring', 'health check', or 'regression testing'.

2 / 3

Distinctiveness Conflict Risk

The combination of 'deployed URL' and 'regressions after deploys' provides some distinctiveness, but 'monitor for regressions' could overlap with testing skills, CI/CD skills, or general monitoring skills. Without specifying the mechanism (visual regression, functional checks, etc.), it's not clearly distinct.

2 / 3

Total

7

/

12

Passed

Implementation

22%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill reads more like a feature specification or product requirements document than an actionable skill for Claude. It describes what a canary watch system should do (check HTTP status, measure performance, alert on thresholds) but provides zero implementation — no actual code to make HTTP requests, no browser automation to measure Web Vitals, no scripts to parse console errors. The '/canary-watch' commands reference a non-existent tool with no explanation of what backs them.

Suggestions

Replace the declarative descriptions with actual executable code showing how to perform each check (e.g., using fetch/curl for HTTP status, Puppeteer/Playwright for Web Vitals, etc.)

Add a concrete step-by-step workflow: 1) capture baseline, 2) run checks, 3) compare against thresholds, 4) if critical alert → take specific action, 5) if sustained watch → loop with sleep

Clarify what tools/MCP servers Claude should use to implement this — the '/canary-watch' command syntax implies a tool that doesn't exist; either define it or show how to build the monitoring from available primitives

Remove the 'When to Use' section entirely — Claude can infer when post-deploy monitoring is appropriate — and use that space for implementation details

DimensionReasoningScore

Conciseness

The content is reasonably structured but includes some unnecessary sections like 'When to Use' which lists obvious scenarios Claude can infer. The 'What It Watches' section is somewhat explanatory rather than instructional. However, it's not egregiously verbose.

2 / 3

Actionability

The skill describes what a canary watch system would do but provides no executable code, no actual implementation, and no concrete commands that work. The '/canary-watch' commands appear to reference a tool that doesn't exist — there's no code showing how to actually perform HTTP checks, measure LCP/CLS, or detect console errors. It reads like a product spec, not an actionable skill.

1 / 3

Workflow Clarity

There is no clear step-by-step workflow for how to actually perform monitoring. The skill describes watch modes and thresholds declaratively but never sequences the actual operations Claude should perform. There are no validation checkpoints or error recovery steps for when checks fail.

1 / 3

Progressive Disclosure

The content is organized into logical sections with clear headers, and the Integration section references related skills. However, the alert thresholds YAML block and output example are inline when they could be referenced separately, and there are no links to detailed implementation docs.

2 / 3

Total

6

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Reviewed

Table of Contents