CtrlK
BlogDocsLog inGet started
Tessl Logo

ark-chainsaw-testing

Run and write Ark Chainsaw tests with mock-llm. Use for running tests, debugging failures, or creating new e2e tests.

82

1.37x
Quality

73%

Does it follow best practices?

Impact

95%

1.37x

Average score across 3 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./.claude/skills/chainsaw/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

75%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is functional and well-structured with a clear 'Use for...' clause that covers the when. Its main strength is distinctiveness due to project-specific terminology. However, the actions described are somewhat high-level and could benefit from more concrete details about what the skill actually does (e.g., specific test patterns, configuration, assertion types).

Suggestions

Add more specific concrete actions beyond 'run and write' — e.g., 'configure mock-llm responses, set up test fixtures, analyze test output logs'

Include additional natural trigger terms users might say, such as 'integration tests', 'end-to-end', 'test suite', or 'CI failures'

DimensionReasoningScore

Specificity

Names the domain (Ark Chainsaw tests with mock-llm) and some actions (run, write, debug, create), but doesn't elaborate on what these tests involve or what specific capabilities are provided beyond high-level verbs.

2 / 3

Completeness

Clearly answers both what ('Run and write Ark Chainsaw tests with mock-llm') and when ('Use for running tests, debugging failures, or creating new e2e tests') with explicit trigger scenarios.

3 / 3

Trigger Term Quality

Includes some relevant keywords like 'Ark Chainsaw tests', 'mock-llm', 'e2e tests', 'debugging failures', but these are fairly project-specific terms. Missing common variations a user might say like 'integration tests', 'test failures', or 'end-to-end'.

2 / 3

Distinctiveness Conflict Risk

Very specific to 'Ark Chainsaw' and 'mock-llm', which are distinct enough project-specific terms that this is unlikely to conflict with other testing or development skills.

3 / 3

Total

10

/

12

Passed

Implementation

72%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a solid, actionable skill with excellent concrete examples, particularly in the antipatterns section which provides both bad and good YAML patterns. The main weaknesses are moderate verbosity in the antipattern explanations and a lack of an explicit step-by-step workflow for writing new tests (the skill delegates this to external files without a clear sequence). The progressive disclosure is well-handled with appropriate references.

Suggestions

Add a brief numbered workflow for writing a new test (e.g., 1. Create directory, 2. Write chainsaw-test.yaml, 3. Add manifests in order, 4. Run with --fail-fast to verify) rather than fully delegating to external files.

Tighten the antipattern explanations — the 'Bad' code comments and brief rationale are sufficient; the paragraph-length explanations before each code block could be reduced to one sentence each.

DimensionReasoningScore

Conciseness

The antipatterns section is thorough but somewhat verbose — the explanations of why each antipattern is bad (e.g., 'This command crashes with a type error when .status.conditions is nil') add useful context but could be tightened. The 'Bad/Good' code pairs are valuable but the surrounding prose is slightly padded.

2 / 3

Actionability

Provides fully executable commands for running tests, concrete YAML examples for both correct and incorrect patterns, specific file structure with naming conventions, and copy-paste ready code blocks throughout.

3 / 3

Workflow Clarity

The test structure and running commands are clear, but the writing workflow delegates to external files ('Reference tests/CLAUDE.md for comprehensive patterns' and 'see examples.md') without providing a clear step-by-step sequence for creating a new test. There's no explicit validation checkpoint for verifying a newly written test works correctly.

2 / 3

Progressive Disclosure

Good structure with clear sections (Running, Writing, Structure, Antipatterns, Environment Variables). References to tests/CLAUDE.md and examples.md are one level deep and clearly signaled. The main file provides enough to get started while pointing to detailed resources.

3 / 3

Total

10

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
mckinsey/agents-at-scale-ark
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.