CtrlK
BlogDocsLog inGet started
Tessl Logo

fix-local-tests

Fix failing tests by prioritising shell implementation fixes to match bash behaviour

50

Quality

55%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./.claude/skills/fix-local-tests/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Content

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a strong, actionable skill with a well-defined workflow for fixing failing tests in a shell implementation project. Its greatest strengths are the concrete, executable commands at every step and the clear decision framework (classification table) for triaging failures. Minor weaknesses include some verbosity in the security preamble, a duplicated step number (two step 7s), and the content being entirely inline without progressive disclosure to supporting files.

Suggestions

Fix the duplicate step 7 numbering — renumber the bash comparison tests step to step 8.

Consider trimming the security preamble to 2-3 sentences; the current version is thorough but could be more concise.

DimensionReasoningScore

Conciseness

The content is mostly efficient and well-structured, but includes some redundancy (e.g., the security preamble is quite long, step 7 appears twice with different content, and some explanations could be tighter). The classification table and method listings are well-organized but slightly verbose.

2 / 3

Actionability

Provides fully executable commands throughout — specific `go test` invocations with flags, Docker commands for bash comparison, corpus file format examples, and exact file paths. Every step has concrete, copy-paste-ready commands.

3 / 3

Workflow Clarity

Clear 7-step sequential workflow with explicit validation checkpoints (step 4 verifies individual fixes, step 7 runs full suite and bash comparison). Includes a feedback loop ('if new failures appear, repeat from step 1') and a clear classification system for triaging failures. The duplicate step 7 numbering is a minor formatting issue but both steps are clear.

3 / 3

Progressive Disclosure

The content is well-organized with clear sections and headers, but it's a fairly long single file with no references to external documentation. The fuzz failure section could potentially be split out. However, for a skill of this complexity, the inline approach is reasonable. References to file paths like `interp/builtins/` and `resources/gnu-coreutils-tests/` help with navigation but aren't linked.

2 / 3

Total

10

/

12

Passed

Description

32%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description conveys a narrow domain—fixing shell implementation tests to match bash behaviour—but lacks explicit trigger guidance ('Use when...') and doesn't enumerate specific concrete actions beyond the general fix. It would benefit from more natural trigger terms and a clear 'when to use' clause to help Claude select it appropriately.

Suggestions

Add an explicit 'Use when...' clause, e.g., 'Use when tests fail due to shell implementation not matching expected bash behaviour, or when debugging shell/bash compatibility issues.'

Include more natural trigger terms users might say, such as 'test failures', 'bash compatibility', 'shell script', 'POSIX', or specific test runner names.

List more specific concrete actions, e.g., 'Diagnoses test failures, identifies shell behaviour discrepancies, applies fixes to align shell implementation with bash semantics.'

DimensionReasoningScore

Specificity

Names the domain (failing tests, shell implementation) and a specific action (fix to match bash behaviour), but doesn't list multiple concrete actions or elaborate on what kinds of fixes are performed.

2 / 3

Completeness

Describes what it does (fix failing tests by prioritising shell implementation fixes) but has no explicit 'Use when...' clause or equivalent trigger guidance, which per the rubric caps completeness at 2, and the 'what' itself is also somewhat thin, bringing it to 1.

1 / 3

Trigger Term Quality

Includes some relevant keywords like 'failing tests', 'shell', 'bash behaviour', but misses common variations users might say such as 'test failures', 'shell script bugs', 'bash compatibility', 'POSIX compliance', or specific test framework terms.

2 / 3

Distinctiveness Conflict Risk

The combination of 'shell implementation' and 'bash behaviour' provides some niche specificity, but 'fix failing tests' is quite broad and could overlap with general test-fixing or debugging skills.

2 / 3

Total

7

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Repository
DataDog/rshell
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.