run-e2e

Run E2E tests locally using the new-e2e framework with Pulumi-based infrastructure

Quality

55%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./.claude/skills/run-e2e/SKILL.md

Quality

Content

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a solid, actionable skill with clear workflow steps, concrete examples, and good troubleshooting guidance. Its main weakness is minor redundancy in path resolution explanations and some content that could be tightened. The skill effectively teaches Claude how to resolve test targets, build commands, and handle edge cases like stuck Pulumi stacks.

Suggestions

Remove the redundant emphasis on target path format—consolidate the 'do NOT include test/new-e2e/' instruction into a single clear statement rather than repeating it in steps 2 and 3.

Merge the 'Examples' and 'Usage' sections since they overlap significantly, or differentiate them more clearly (e.g., Usage for slash-command syntax, Examples for raw command syntax).

Dimension	Reasoning	Score
Conciseness	The skill is mostly efficient but has some redundancy—the target path resolution rules are repeated (step 2 and step 3 both emphasize not including 'test/new-e2e/'), and the examples section partially duplicates the usage section. The prerequisites and troubleshooting sections are appropriately brief.	2 / 3
Actionability	The skill provides fully concrete, executable commands with specific flags, clear resolution logic for test targets, and multiple copy-paste ready examples. The step-by-step instructions for parsing arguments and building commands are specific and actionable.	3 / 3
Workflow Clarity	The 7-step workflow is clearly sequenced with important checkpoints: confirming the command before running (step 5), using background execution with timeout (step 6), summarizing results (step 7), and handling ambiguous inputs (step 2). The troubleshooting section for stuck stacks includes an explicit early-stop instruction, serving as a validation/feedback loop.	3 / 3
Progressive Disclosure	The content is well-structured with clear sections, but it's somewhat long for a single file with no bundle files to offload detail into. The flags reference, examples, and troubleshooting could be split into separate files. However, for a skill of this complexity, keeping it in one file is reasonable.	2 / 3
	Total	10 / 12 Passed

Description

32%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is a single clause that identifies the domain (E2E testing) and specific tooling (new-e2e framework, Pulumi) but lacks a 'Use when...' clause, lists only one action ('Run'), and misses common user-facing trigger terms. It reads more like a title than a description that would help Claude reliably select this skill from a large pool.

Suggestions

Add an explicit 'Use when...' clause with trigger scenarios, e.g., 'Use when the user wants to run end-to-end tests, set up Pulumi test infrastructure, or debug new-e2e test failures.'

Expand the capability list beyond just 'Run' to include specific actions like configuring test environments, debugging test failures, managing Pulumi stacks, or interpreting test results.

Include natural language variations of key terms such as 'end-to-end tests', 'e2e testing', 'integration tests', 'test infrastructure', and 'local test execution' to improve trigger term coverage.

Dimension	Reasoning	Score
Specificity	Names the domain (E2E tests) and mentions specific tools (new-e2e framework, Pulumi-based infrastructure), but only describes one action ('Run') without listing multiple concrete capabilities like debugging, configuring, or analyzing test results.	2 / 3
Completeness	Describes what it does (run E2E tests with Pulumi infrastructure) but completely lacks a 'Use when...' clause or any explicit trigger guidance for when Claude should select this skill. Per rubric guidelines, missing 'Use when' caps completeness at 2, and the 'what' is also thin, warranting a 1.	1 / 3
Trigger Term Quality	Includes relevant terms like 'E2E tests', 'Pulumi', and 'new-e2e framework', but misses common variations users might say such as 'end-to-end tests', 'integration tests', 'test locally', or 'infrastructure testing'. The term 'new-e2e' is quite specific/internal jargon.	2 / 3
Distinctiveness Conflict Risk	The mention of 'new-e2e framework' and 'Pulumi-based infrastructure' provides some distinctiveness, but 'E2E tests' is broad enough to potentially overlap with other testing-related skills. The specific framework name helps but isn't sufficient for clear disambiguation.	2 / 3
	Total	7 / 12 Passed

Validation

81%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 9 / 11 Passed

Validation for skill structure

Criteria	Description	Result
allowed_tools_field	'allowed-tools' contains unusual tool name(s)	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	9 / 11 Passed

Repository: DataDog/datadog-agent
Commit: 409dcc9

Reviewed: about 13 hours ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.