run-e2e

Run E2E tests locally using the new-e2e framework with Pulumi-based infrastructure

Quality

55%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./.claude/skills/run-e2e/SKILL.md

Quality

Discovery

32%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description identifies a specific technical domain (E2E testing with a particular framework and infrastructure tool) but is too terse and lacks explicit trigger guidance. It reads more like a task label than a skill description, missing both the breadth of capabilities and the 'when to use' clause needed for effective skill selection.

Suggestions

Add a 'Use when...' clause specifying trigger scenarios, e.g., 'Use when the user wants to run end-to-end tests, set up Pulumi test infrastructure, or debug new-e2e test failures.'

Expand the capability list beyond just 'Run' to include related actions like configuring test environments, debugging test failures, writing new test cases, or managing Pulumi stacks.

Include natural language variations of key terms such as 'end-to-end tests', 'e2e testing', 'local test execution', and 'infrastructure provisioning' to improve trigger term coverage.

Dimension	Reasoning	Score
Specificity	Names the domain (E2E tests) and mentions specific technologies (new-e2e framework, Pulumi-based infrastructure), but only describes one action ('Run') without listing multiple concrete capabilities like debugging, configuring, or analyzing test results.	2 / 3
Completeness	Describes what it does (run E2E tests locally using new-e2e framework with Pulumi) but completely lacks a 'Use when...' clause or any explicit trigger guidance for when Claude should select this skill. Per rubric guidelines, missing 'Use when' caps completeness at 2, and since the 'what' is also only partially described, this scores a 1.	1 / 3
Trigger Term Quality	Includes relevant keywords like 'E2E tests', 'new-e2e framework', and 'Pulumi', which are useful trigger terms. However, it misses common variations users might say such as 'end-to-end tests', 'integration tests', 'test locally', or 'infrastructure provisioning'.	2 / 3
Distinctiveness Conflict Risk	The mention of 'new-e2e framework' and 'Pulumi-based infrastructure' provides some distinctiveness, but without clearer scoping it could overlap with other testing or infrastructure skills. The specific framework name helps but the description is still somewhat ambiguous about its exact niche.	2 / 3
	Total	7 / 12 Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a solid, actionable skill with a clear multi-step workflow, concrete examples, and good troubleshooting guidance for Pulumi stack issues. Its main weaknesses are minor redundancy in path resolution instructions and the fact that all content lives in a single file without progressive disclosure to supporting references. The flag documentation and examples are comprehensive and immediately usable.

Suggestions

Remove the redundant emphasis on relative paths—state the rule once clearly in step 2 and reference it briefly in step 3 rather than restating it

Consider extracting the full flags reference into a separate REFERENCE.md to keep the main skill leaner and improve progressive disclosure

Dimension	Reasoning	Score
Conciseness	The skill is mostly efficient but has some redundancy—the target path resolution rules are repeated (step 2 and step 3 both emphasize the relative path convention), and the 'Available Test Suites' section adds little value beyond 'run ls'. The prerequisites and troubleshooting sections are appropriately brief.	2 / 3
Actionability	The skill provides fully concrete, executable commands with specific flags, clear examples covering multiple scenarios, and explicit instructions for resolving test targets. The examples are copy-paste ready and the flag reference is comprehensive.	3 / 3
Workflow Clarity	The 7-step workflow is clearly sequenced with validation checkpoints: confirming the command with the user before running (step 5), using background execution with timeout (step 6), summarizing results after completion (step 7), and a troubleshooting section for stuck Pulumi stacks with early termination guidance. The ambiguity resolution step (ask user to pick) is a good feedback loop.	3 / 3
Progressive Disclosure	The content is well-structured with clear sections (Instructions, Prerequisites, Examples, Usage, Output, troubleshooting), but everything is in a single file that's moderately long. The flag reference and examples could potentially be split out, though for a skill of this size it's borderline acceptable. No bundle files are referenced or provided.	2 / 3
	Total	10 / 12 Passed

Validation

81%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 9 / 11 Passed

Validation for skill structure

Criteria	Description	Result
allowed_tools_field	'allowed-tools' contains unusual tool name(s)	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	9 / 11 Passed

Repository: DataDog/datadog-agent
Commit: 0f36ad4

Reviewed: about 16 hours ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.