CtrlK
BlogDocsLog inGet started
Tessl Logo

fix-ci-tests

Diagnose and fix CI failures on a GitHub PR by analyzing failing checks, reading logs, and applying fixes

61

Quality

72%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./.claude/skills/fix-ci-tests/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Content

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a strong, highly actionable skill with excellent workflow clarity and concrete executable commands throughout. Its main weakness is length—at ~300 lines with inline GraphQL queries and detailed sub-workflows, it could benefit from splitting advanced sections into referenced files. The security callout is appropriate but could be more concise.

Suggestions

Extract the GraphQL queries for resolving review comments into a separate reference file (e.g., GRAPHQL_HELPERS.md) to reduce the main skill's token footprint.

Consider moving the detailed fuzz failure fix workflow and platform-specific guidance into separate referenced files, keeping only a brief summary and link in the main skill.

DimensionReasoningScore

Conciseness

The skill is fairly long (~300 lines) and includes some content that could be tightened—e.g., the detailed CI job table is repo-specific and useful, but the extensive GraphQL examples for resolving comments and the repeated security warnings add bulk. However, most content is actionable and not explaining things Claude already knows, so it's mostly efficient but not lean.

2 / 3

Actionability

The skill provides fully executable bash and Go commands throughout, with specific flags, environment variables, and exact command patterns. Every step has copy-paste ready commands, and the failure categories map directly to concrete fix strategies with real code examples.

3 / 3

Workflow Clarity

The 10-step workflow is clearly sequenced with explicit validation checkpoints (step 4: reproduce locally, step 7: verify all fixes, feedback loop 'if new failures appear, repeat from step 4'). The failure classification table provides clear decision paths, and destructive operations like git push only happen after verification.

3 / 3

Progressive Disclosure

The content is well-structured with clear headers and tables, but it's monolithic—all content is inline in a single file with no references to supporting documents. The GraphQL queries, fuzz fix workflow, and platform-specific guidance could be split into separate reference files. However, it references the 'fix-tests' skill for bash comparison failures, showing some awareness of cross-referencing.

2 / 3

Total

10

/

12

Passed

Description

67%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is strong in specificity and distinctiveness, clearly identifying the skill's niche of diagnosing CI failures on GitHub PRs. However, it lacks an explicit 'Use when...' clause and could benefit from broader trigger term coverage including common user phrasings like 'build failed' or 'pipeline broken'.

Suggestions

Add a 'Use when...' clause, e.g., 'Use when a GitHub PR has failing checks, CI pipeline errors, or the user mentions build failures.'

Include additional natural trigger terms users might say: 'build broken', 'pipeline failed', 'tests failing', 'GitHub Actions', 'CI/CD', 'red checks', 'workflow failed'.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: 'diagnose and fix CI failures', 'analyzing failing checks', 'reading logs', and 'applying fixes'. These are clear, actionable capabilities.

3 / 3

Completeness

Clearly answers 'what does this do' (diagnose and fix CI failures by analyzing checks, reading logs, applying fixes), but lacks an explicit 'Use when...' clause specifying when Claude should select this skill. The rubric caps completeness at 2 without explicit trigger guidance.

2 / 3

Trigger Term Quality

Includes good terms like 'CI failures', 'GitHub PR', 'failing checks', and 'logs', but misses common user variations like 'pipeline failed', 'build broken', 'tests failing', 'CI/CD', 'GitHub Actions', or 'red checks'.

2 / 3

Distinctiveness Conflict Risk

The combination of 'CI failures', 'GitHub PR', 'failing checks', and 'logs' creates a clear niche that is unlikely to conflict with other skills. This is a well-defined, specific domain.

3 / 3

Total

10

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Repository
DataDog/rshell
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.