CtrlK
BlogDocsLog inGet started
Tessl Logo

run-e2e

Run E2E tests locally using the new-e2e framework with Pulumi-based infrastructure

49

Quality

55%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./.claude/skills/run-e2e/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Content

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a solid, actionable skill with clear workflow steps, concrete examples, and good troubleshooting guidance. Its main weakness is minor redundancy in path resolution explanations and some content that could be tightened. The skill effectively teaches Claude how to resolve test targets, build commands, and handle edge cases like stuck Pulumi stacks.

Suggestions

Remove the redundant emphasis on target path format—consolidate the 'do NOT include test/new-e2e/' instruction into a single clear statement rather than repeating it in steps 2 and 3.

Merge the 'Examples' and 'Usage' sections since they overlap significantly, or differentiate them more clearly (e.g., Usage for slash-command syntax, Examples for raw command syntax).

DimensionReasoningScore

Conciseness

The skill is mostly efficient but has some redundancy—the target path resolution rules are repeated (step 2 and step 3 both emphasize not including 'test/new-e2e/'), and the examples section partially duplicates the usage section. The prerequisites and troubleshooting sections are appropriately brief.

2 / 3

Actionability

The skill provides fully concrete, executable commands with specific flags, clear resolution logic for test targets, and multiple copy-paste ready examples. The step-by-step instructions for parsing arguments and building commands are specific and actionable.

3 / 3

Workflow Clarity

The 7-step workflow is clearly sequenced with important checkpoints: confirming the command before running (step 5), using background execution with timeout (step 6), summarizing results (step 7), and handling ambiguous inputs (step 2). The troubleshooting section for stuck stacks includes an explicit early-stop instruction, serving as a validation/feedback loop.

3 / 3

Progressive Disclosure

The content is well-structured with clear sections, but it's somewhat long for a single file with no bundle files to offload detail into. The flags reference, examples, and troubleshooting could be split into separate files. However, for a skill of this complexity, keeping it in one file is reasonable.

2 / 3

Total

10

/

12

Passed

Description

32%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is a single clause that identifies the domain (E2E testing) and specific tooling (new-e2e framework, Pulumi) but lacks a 'Use when...' clause, lists only one action ('Run'), and misses common user-facing trigger terms. It reads more like a title than a description that would help Claude reliably select this skill from a large pool.

Suggestions

Add an explicit 'Use when...' clause with trigger scenarios, e.g., 'Use when the user wants to run end-to-end tests, set up Pulumi test infrastructure, or debug new-e2e test failures.'

Expand the capability list beyond just 'Run' to include specific actions like configuring test environments, debugging test failures, managing Pulumi stacks, or interpreting test results.

Include natural language variations of key terms such as 'end-to-end tests', 'e2e testing', 'integration tests', 'test infrastructure', and 'local test execution' to improve trigger term coverage.

DimensionReasoningScore

Specificity

Names the domain (E2E tests) and mentions specific tools (new-e2e framework, Pulumi-based infrastructure), but only describes one action ('Run') without listing multiple concrete capabilities like debugging, configuring, or analyzing test results.

2 / 3

Completeness

Describes what it does (run E2E tests with Pulumi infrastructure) but completely lacks a 'Use when...' clause or any explicit trigger guidance for when Claude should select this skill. Per rubric guidelines, missing 'Use when' caps completeness at 2, and the 'what' is also thin, warranting a 1.

1 / 3

Trigger Term Quality

Includes relevant terms like 'E2E tests', 'Pulumi', and 'new-e2e framework', but misses common variations users might say such as 'end-to-end tests', 'integration tests', 'test locally', or 'infrastructure testing'. The term 'new-e2e' is quite specific/internal jargon.

2 / 3

Distinctiveness Conflict Risk

The mention of 'new-e2e framework' and 'Pulumi-based infrastructure' provides some distinctiveness, but 'E2E tests' is broad enough to potentially overlap with other testing-related skills. The specific framework name helps but isn't sufficient for clear disambiguation.

2 / 3

Total

7

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation9 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

allowed_tools_field

'allowed-tools' contains unusual tool name(s)

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

9

/

11

Passed

Repository
DataDog/datadog-agent
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.