Name: matthew-a-carr/deploy-smoke-test
Rating: 78 (1 reviews)
Author: matthew-a-carr

matthew-a-carr/deploy-smoke-test

Confirm a production deploy actually landed and is healthy. Verifies the latest Vercel Production deployment is READY and matches the current `main` commit, runs HTTP canary checks against travel.matthewcarr.dev, confirms migrations applied, and checks for a post-deploy Sentry error spike. Use after merging to `main`, or when a human asks "is prod healthy?" / "did the deploy go out?" / "smoke test production". Read-only against prod by default.

1.53x

Quality

90%

Does it follow best practices?

Impact

100%

1.53x

Average score across 1 eval scenario

Securityby

Passed

No known issues

Quality

Content

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a strong, actionable skill with a clear multi-step workflow, concrete commands, and explicit pass/fail criteria at each stage. Its main weakness is moderate verbosity in the preamble sections (deploy pipeline explanation, untrusted content boilerplate) that consume tokens without adding much value for Claude. The progressive disclosure is adequate for a standalone skill but could be tighter.

Suggestions

Trim the 'When to use' section to 2-3 sentences — Claude doesn't need the full explanation of Vercel Git-integration mechanics, just when to invoke the skill.

Consider removing or drastically shortening the 'Untrusted content' section, which is generic security boilerplate that could live in a shared AGENTS.md reference rather than being repeated per-skill.

Dimension	Reasoning	Score
Conciseness	The skill is mostly efficient but includes some unnecessary context that Claude could infer (e.g., explaining what Vercel Git-integration is, the 'Untrusted content' section is boilerplate, and some parenthetical asides like ADR references add noise). The 'When to use' section over-explains the deploy pipeline. However, the core steps are reasonably tight.	2 / 3
Actionability	Provides specific, executable commands throughout (git rev-parse, vercel ls, vercel inspect, curl with exact flags and expected status codes). The report template is copy-paste ready. Each step has concrete expected outputs and clear pass/fail criteria.	3 / 3
Workflow Clarity	The 5-step workflow is clearly sequenced with explicit validation at each stage: commit match check, deployment state verification, HTTP canary pass/fail criteria, Sentry error spike check, and a structured report template. The 'Acting on a bad deploy' section provides error recovery guidance, and the skill explicitly gates destructive actions behind user confirmation.	3 / 3
Progressive Disclosure	The content is well-organized with clear sections, but it's all inline in a single file with no bundle files to offload detail to. References to external docs (docs/operations/sentry.md, AGENTS.md, ADRs) are mentioned but the skill itself could benefit from splitting the report template or remediation steps into separate files. For its length (~100 lines), the inline approach is borderline acceptable.	2 / 3
	Total	10 / 12 Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that clearly articulates specific actions (Vercel deployment verification, HTTP canary checks, migration confirmation, Sentry error spike detection), provides explicit trigger phrases that users would naturally say, and carves out a distinct niche. The inclusion of quoted user phrases as triggers and the 'Read-only against prod by default' safety note are particularly strong touches.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: verifies Vercel Production deployment is READY, matches current main commit, runs HTTP canary checks against a specific domain, confirms migrations applied, and checks for post-deploy Sentry error spike.	3 / 3
Completeness	Clearly answers both what (verifies Vercel deployment status, runs HTTP canary checks, confirms migrations, checks Sentry errors) and when ('Use after merging to main, or when a human asks "is prod healthy?" / "did the deploy go out?" / "smoke test production"') with explicit triggers.	3 / 3
Trigger Term Quality	Excellent coverage of natural trigger terms: 'production deploy', 'is prod healthy?', 'did the deploy go out?', 'smoke test production', 'merging to main'. These are phrases users would naturally say when needing this skill.	3 / 3
Distinctiveness Conflict Risk	Highly distinctive with a clear niche: post-deploy production health verification for a specific project (travel.matthewcarr.dev) using specific tools (Vercel, Sentry). Very unlikely to conflict with other skills due to the specific domain, toolchain, and use case.	3 / 3
	Total	12 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Reviewed

about 1 month ago

Table of Contents

Discovery Implementation Validation