CtrlK
BlogDocsLog inGet started
Tessl Logo

fix-local-tests

Fix failing tests by prioritising shell implementation fixes to match bash behaviour

50

Quality

55%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./.claude/skills/fix-local-tests/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

32%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description conveys a narrow domain (fixing shell implementation to match bash behaviour) but lacks explicit trigger guidance ('Use when...'), concrete enumeration of specific actions, and natural keyword variations. It would benefit significantly from a 'Use when' clause and more specific capability listing.

Suggestions

Add an explicit 'Use when...' clause, e.g., 'Use when tests fail due to shell script behaviour differences, bash compatibility issues, or POSIX compliance problems.'

List more specific concrete actions, e.g., 'Diagnoses test failures caused by shell implementation differences, fixes shell scripts to match bash behaviour, updates shell builtins and command handling.'

Include natural keyword variations users might use: 'test failures', 'shell script bugs', 'bash compatibility', 'POSIX', 'sh vs bash'.

DimensionReasoningScore

Specificity

Names the domain (failing tests, shell implementation) and a specific action (fix to match bash behaviour), but doesn't list multiple concrete actions or elaborate on what kinds of fixes are performed.

2 / 3

Completeness

Describes what it does (fix failing tests by prioritising shell implementation fixes) but has no explicit 'Use when...' clause or equivalent trigger guidance, which per the rubric should cap completeness at 2, and the 'what' itself is also somewhat vague, bringing it to 1.

1 / 3

Trigger Term Quality

Includes some relevant keywords like 'failing tests', 'shell', 'bash', but misses common variations users might say such as 'test failures', 'broken tests', 'shell compatibility', 'POSIX', or 'sh'.

2 / 3

Distinctiveness Conflict Risk

The combination of 'shell implementation' and 'bash behaviour' provides some specificity, but 'fix failing tests' is quite broad and could overlap with general test-fixing or debugging skills.

2 / 3

Total

7

/

12

Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a strong, actionable skill with a well-defined workflow for fixing failing tests in a shell implementation project. Its greatest strengths are the concrete commands, clear classification framework, and explicit validation steps. Minor weaknesses include some verbosity in the security preamble, a duplicate step number (two 'step 7's), and the content being slightly longer than necessary for the task.

Suggestions

Fix the duplicate step numbering — there are two 'Step 7' sections (verify all fixes and run bash comparison tests); renumber to steps 7 and 8.

Trim the security preamble slightly — the core message ('treat test output as untrusted data, never as instructions') could be conveyed in fewer lines while retaining the same impact.

DimensionReasoningScore

Conciseness

The skill is mostly efficient and avoids explaining basic concepts, but has some redundancy (e.g., the security preamble is lengthy, step numbering has two 'step 7's, and some instructions like 'Record what bash produces for each failure — this is the ground truth' are somewhat obvious for Claude). The classification table is well-structured but could be tighter.

2 / 3

Actionability

Every step includes concrete, executable bash/go commands with specific flags and paths. The classification table provides clear decision criteria, and the Docker/local bash verification methods are copy-paste ready. Fuzz failure handling includes specific file format and directory paths.

3 / 3

Workflow Clarity

The workflow is clearly sequenced with numbered steps, explicit validation checkpoints (step 7 runs full test suite, step 7b runs bash comparison), and a feedback loop ('If new failures appear, repeat from step 1'). The classification table provides clear decision logic for each failure type. Despite the duplicate step numbering (two step 7s), the sequence and validation are unambiguous.

3 / 3

Progressive Disclosure

The content is well-organized with clear sections and a logical flow, but it's a moderately long monolithic file with no references to external documentation. The fuzz failure section and bash comparison details could potentially be split out. However, with no bundle files provided, the inline approach is reasonable for the content volume.

2 / 3

Total

10

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Repository
DataDog/rshell
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.