Expert performance engineer specializing in modern observability,
38
6%
Does it follow best practices?
Impact
96%
1.03xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/performance-engineer/SKILL.mdQuality
Discovery
0%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description is critically weak across all dimensions. It reads as a persona label ('Expert performance engineer') rather than a functional skill description, provides no concrete actions, no trigger terms, and no guidance on when to use it. The description also appears truncated (ends with a comma), suggesting it may be incomplete.
Suggestions
Replace the persona framing with concrete actions, e.g., 'Analyzes application performance metrics, configures monitoring dashboards, sets up distributed tracing, and diagnoses latency issues.'
Add an explicit 'Use when...' clause with natural trigger terms, e.g., 'Use when the user asks about observability, monitoring, APM, tracing, metrics, logging, latency, or performance bottlenecks.'
Narrow the scope to a distinct niche to reduce conflict risk, e.g., specify which observability tools or platforms (Datadog, Prometheus, Grafana, OpenTelemetry) or which types of performance analysis this skill covers.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description uses vague, abstract language ('expert performance engineer') without listing any concrete actions. It describes a persona rather than specific capabilities like 'analyze traces', 'set up dashboards', or 'configure alerts'. | 1 / 3 |
Completeness | The description barely addresses 'what does this do' (vague reference to observability) and completely lacks any 'when should Claude use it' guidance. There is no 'Use when...' clause or equivalent trigger guidance. | 1 / 3 |
Trigger Term Quality | The only potentially relevant keywords are 'performance' and 'observability', which are broad technical jargon. It lacks natural user terms like 'monitoring', 'metrics', 'tracing', 'logging', 'APM', 'latency', or 'dashboards'. | 1 / 3 |
Distinctiveness Conflict Risk | The description is extremely generic — 'performance engineer' and 'modern observability' could overlap with numerous skills related to DevOps, monitoring, infrastructure, debugging, or profiling. There are no distinct triggers to differentiate it. | 1 / 3 |
Total | 4 / 12 Passed |
Implementation
12%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is essentially a persona description and technology catalog rather than actionable instructions. It lists hundreds of tools and concepts Claude already knows without providing any concrete code, configurations, or executable guidance. The massive enumeration of capabilities wastes token budget while adding minimal instructional value.
Suggestions
Replace the extensive 'Capabilities' bullet lists with concrete, executable examples for the most common tasks (e.g., a k6 load test script, an OpenTelemetry setup snippet, a Prometheus query for latency percentiles).
Add specific validation checkpoints to the workflow, such as 'Verify baseline p99 latency is captured before making changes' and 'Run load test at 50% target before full load'.
Remove the 'Knowledge Base', 'Behavioral Traits', and 'Example Interactions' sections—these describe what Claude already knows or can infer. Use that space for actual code examples and tool configurations.
Split detailed tool-specific guidance (e.g., OpenTelemetry setup, k6 scripts, database query optimization patterns) into separate referenced files to improve progressive disclosure.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose with extensive lists of tools, platforms, and concepts that Claude already knows. The 'Capabilities' section is a massive enumeration of technologies and techniques that adds no actionable value—it reads like a resume rather than instructions. The 'Knowledge Base' section repeats what's already in 'Capabilities'. 'Behavioral Traits' describes general good practices Claude already understands. | 1 / 3 |
Actionability | No concrete code, commands, or executable examples anywhere. The instructions are entirely abstract ('Collect traces, profiles, and load tests to isolate bottlenecks'). No specific tool configurations, code snippets, query examples, or copy-paste ready commands are provided despite the domain being highly technical. | 1 / 3 |
Workflow Clarity | The 4-step Instructions section and 9-step Response Approach provide a reasonable high-level sequence, but they lack validation checkpoints, feedback loops, and specific criteria for moving between steps. For a skill involving potentially destructive operations like load testing production systems, the safety section is too brief. | 2 / 3 |
Progressive Disclosure | Monolithic wall of text with no references to external files. All content is inline in one massive document with 12+ subsections of bullet-point lists. The content would benefit enormously from splitting detailed tool references, example configurations, and domain-specific guides into separate files. | 1 / 3 |
Total | 5 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
ef6214c
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.