CtrlK
BlogDocsLog inGet started
Tessl Logo

fix-local-tests

Fix failing tests by prioritising shell implementation fixes to match bash behaviour

63

Quality

55%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./.claude/skills/fix-local-tests/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

32%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description conveys a narrow domain (fixing shell implementation to match bash behaviour) but lacks explicit trigger guidance ('Use when...'), concrete enumeration of specific actions, and natural keyword variations. It reads more like a brief commit message than a skill description designed for selection among many skills.

Suggestions

Add an explicit 'Use when...' clause, e.g., 'Use when tests fail due to shell script incompatibilities, bash vs sh differences, or POSIX compliance issues.'

List specific concrete actions such as 'Diagnoses test failures caused by shell syntax differences, fixes quoting issues, corrects bash-specific constructs for POSIX compatibility, and updates shell scripts to match expected bash behaviour.'

Include natural trigger term variations users might use: 'test failures', 'shell script bugs', 'bash compatibility', 'sh', 'POSIX', 'shell syntax errors'.

DimensionReasoningScore

Specificity

Names the domain (failing tests, shell implementation) and a specific action (fix to match bash behaviour), but doesn't list multiple concrete actions or elaborate on what kinds of fixes are performed.

2 / 3

Completeness

Describes what it does (fix failing tests by prioritising shell implementation fixes) but has no explicit 'Use when...' clause or equivalent trigger guidance, which per the rubric caps completeness at 2, and the 'what' itself is also somewhat vague, placing this at 1.

1 / 3

Trigger Term Quality

Includes some relevant keywords like 'failing tests', 'shell', 'bash', but misses common variations users might say such as 'test failures', 'broken tests', 'shell compatibility', 'POSIX', or 'sh'.

2 / 3

Distinctiveness Conflict Risk

The combination of 'shell implementation' and 'bash behaviour' provides some specificity, but 'fix failing tests' is quite broad and could overlap with general test-fixing or debugging skills.

2 / 3

Total

7

/

12

Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a strong, actionable skill with a clear multi-step workflow, explicit validation checkpoints, and concrete executable commands throughout. The classification table and multiple verification methods provide excellent guidance for triaging failures. Minor weaknesses include some verbosity in the security preamble, a duplicated step number (two step 7s), and the content being entirely self-contained without progressive disclosure to supporting files.

Suggestions

Fix the duplicate step 7 numbering — the bash comparison tests should be step 8

Consider trimming the security preamble to 2-3 lines; the current version is thorough but verbose for a skill targeting Claude

DimensionReasoningScore

Conciseness

The skill is mostly efficient and well-structured, but includes some redundancy (e.g., the security preamble is quite long, step 7 appears twice with different content, and some explanations like 'this is the ground truth' are unnecessary for Claude). The classification table is a nice touch but could be slightly tighter.

2 / 3

Actionability

Provides fully executable bash and go test commands throughout, with specific flags (-race, -v, -timeout), environment variables (RSHELL_BASH_TEST=1), concrete file paths (interp/builtins/, testdata/fuzz/), and exact command patterns. The fuzz corpus file format is copy-paste ready.

3 / 3

Workflow Clarity

The workflow is clearly sequenced with numbered steps, explicit validation checkpoints (step 4 includes re-running tests after each fix, step 7 runs full suite, then bash comparison tests). There are feedback loops (repeat from step 1 if new failures appear) and a clear classification system for triaging failures. The only minor issue is the duplicate step 7 numbering, but the content is clear.

3 / 3

Progressive Disclosure

The content is well-organized with clear sections and a logical flow, but it's a moderately long monolithic file (~100 lines of content) with no references to external files for detailed topics like fuzz testing or YAML scenario format. The classification table and multiple methods for determining bash behaviour could potentially be split out, though the lack of bundle files means there's nothing to reference.

2 / 3

Total

10

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Repository
DataDog/rshell
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.