antithesis-debug

Interactively debug an Antithesis test run in the multiverse debugger (MVD): launch a session from a run, open a debugging-session URL, and inspect container filesystem and runtime state from inside the run.

Quality

—

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

Quality

Content

85%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The body is highly actionable with clear, validated workflows and excellent progressive disclosure across real one-level-deep reference files. Its main weakness is conciseness: repeated re-injection guidance and time-sensitive version info placed outside a deprecated section.

Suggestions

Consolidate the runtime-injection/re-injection guidance into one section and reference it from the others to remove repetition.

Move the agent-browser v0.23.4 version requirement and the dated metadata version into a dedicated compatibility/prerequisites or "old patterns" section so time-sensitive info does not dilute conciseness.

Consider trimming the overlapping waitForReady/loadingFinished/loadingStatus examples, keeping one canonical pattern and pointing the rest to a reference file.

Dimension	Reasoning	Score
Conciseness	The body is mostly efficient and avoids explaining concepts Claude already knows, but it repeats the runtime re-injection guidance across "Runtime injection", "Page loading checks", and "General guidance", and carries time-sensitive version info ("v0.23.4+", metadata "2026-07-07 38a11c4") outside a deprecated/old-patterns section — the score-2 anchor for mostly-efficient-but-tightenable rather than the lean score-3 anchor.	2 / 3
Actionability	Provides fully executable, copy-paste-ready commands such as `agent-browser --session "$SESSION" eval "window.__antithesisDebug.getMode()"` and `window.__antithesisDebug.simplified.runCommand("ls -la /")`, matching the score-3 anchor of complete executable examples rather than the pseudocode score-2 anchor.	3 / 3
Workflow Clarity	Recommended workflows are explicitly numbered and sequenced, with validation checkpoints (waitForReady returning {ok, ready, attempts, waitedMs} plus timeout details) and error-recovery feedback loops ("A nonzero exit code terminates the branch... Fork a fresh branch"; reinject-on-missing-runtime retry), matching the score-3 anchor.	3 / 3
Progressive Disclosure	Clear overview with well-signaled one-level-deep references via "Page \| When to read" tables; all referenced files (references/.md, assets/.js\|.sh\|.py) exist and content is appropriately split into setup, simplified, advanced, notebook, actions, common-inspections, and download-log — matching the score-3 anchor rather than the inline-heavy score-2 anchor.	3 / 3
	Total	11 / 12 Passed

Description

82%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is specific, domain-distinctive, and uses natural trigger terms, but it omits an explicit "Use when..." trigger clause, capping completeness at 2. Adding a when-to-use clause would round it out.

Suggestions

Append a "Use when..." clause naming the natural triggers (e.g. an Antithesis debugging-session URL, a request to inspect container filesystem/runtime state, or the triage skill launching an MVD session) so completeness can reach 3.

Keep the concrete action list as-is; it already hits the specificity score-3 anchor.

Dimension	Reasoning	Score
Specificity	Lists multiple concrete actions — "launch a session from a run, open a debugging-session URL, and inspect container filesystem and runtime state from inside the run" — matching the score-3 anchor of multiple specific concrete actions rather than the single-domain score-2 anchor.	3 / 3
Completeness	It clearly answers "what" (launch session, open URL, inspect state) but has no "Use when..." clause or equivalent explicit trigger guidance in the description itself, so per the judging guidelines completeness is capped at 2 rather than reaching the explicit-both score-3 anchor.	2 / 3
Trigger Term Quality	For this specialized tool the natural user terms are present — "Antithesis test run", "multiverse debugger (MVD)", "debugging-session URL", "container filesystem", "runtime state" — which a user of the debugger would naturally say, giving good coverage rather than the partial score-2 set.	3 / 3
Distinctiveness Conflict Risk	The "Antithesis multiverse debugger (MVD)" niche is highly specific with distinct triggers unlikely to conflict with other skills, matching the clear-niche score-3 anchor and clearly above the somewhat-specific score-2 anchor.	3 / 3
	Total	11 / 12 Passed

Validation

93%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 15 / 16 Passed

Validation for skill structure

Criteria	Description	Result
referenced_paths_exist	Referenced path issues: 1 missing	Warning

	Total	15 / 16 Passed

Repository: antithesishq/antithesis-skills
Commit: 9b75328

Reviewed: about 9 hours ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.