CtrlK
BlogDocsLog inGet started
Tessl Logo

load-testing

Load test a Databricks App to find its maximum QPS. Use when: (1) User says 'load test', 'benchmark', 'QPS', 'throughput', or 'performance test', (2) User wants to find how many queries per second their app can handle, (3) User wants to set up load testing scripts for their agent, (4) User wants to view load test results/dashboard.

68

Quality

81%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Risky

Do not use without reviewing

SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description that clearly defines its purpose, lists concrete capabilities, and provides explicit trigger guidance with natural user terms. It follows the recommended pattern with a concise 'what' statement followed by a well-structured 'Use when' clause covering multiple scenarios. The description is specific to a clear niche (Databricks App load testing) making it highly distinguishable.

DimensionReasoningScore

Specificity

Lists specific concrete actions: load testing a Databricks App, finding maximum QPS, setting up load testing scripts, and viewing load test results/dashboard. These are clear, actionable capabilities.

3 / 3

Completeness

Clearly answers both 'what' (load test a Databricks App to find its maximum QPS) and 'when' with an explicit 'Use when:' clause listing four specific trigger scenarios.

3 / 3

Trigger Term Quality

Excellent coverage of natural trigger terms: 'load test', 'benchmark', 'QPS', 'throughput', 'performance test', 'queries per second', 'load testing scripts', 'load test results/dashboard'. These are terms users would naturally use.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive — targets a specific niche (load testing Databricks Apps for QPS) with domain-specific triggers like 'QPS', 'throughput', and 'Databricks App'. Unlikely to conflict with other skills.

3 / 3

Total

12

/

12

Passed

Implementation

62%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured, domain-specific skill with a clear 5-step workflow and good validation checkpoints. Its main weakness is that the three core Python scripts are described rather than provided as executable code, which undermines actionability — Claude must generate substantial code from prose specifications. The skill is also longer than necessary, with some sections (dashboard interpretation, mocking rationale) that could be trimmed or split out.

Suggestions

Provide the actual executable code for locustfile.py, run_load_test.py, and dashboard_template.py (or include them as bundle files and reference them), rather than describing what they should do in prose.

Split mocking guidance, dashboard interpretation, and troubleshooting into separate referenced files to reduce the main skill's length and improve progressive disclosure.

Trim explanatory prose that Claude can infer — e.g., remove the bullet list explaining why mocking is useful and the 'Interpreting Results' definitions of obvious metrics like 'Failure Rate'.

DimensionReasoningScore

Conciseness

The skill is quite long (~250 lines) and includes some information Claude could infer (e.g., explaining what mocking is useful for, what peak QPS means). However, most content is domain-specific configuration details (DAB variables, Locust pinning, OAuth setup) that genuinely add value. Could be tightened by ~30% without losing substance.

2 / 3

Actionability

The skill provides concrete CLI commands, YAML configs, and directory structures, which is strong. However, the three core scripts (locustfile.py, run_load_test.py, dashboard_template.py) are described in prose rather than provided as executable code — Claude is told what they should do but must generate them from descriptions. The mock example is a 3-line snippet referencing a file that isn't in the bundle.

2 / 3

Workflow Clarity

The 5-step workflow is clearly sequenced with explicit validation checkpoints: verify apps are ACTIVE before testing, healthcheck + warmup before load test, and a troubleshooting table for common failures. The ramp-to-saturation process includes clear interpretation guidance for identifying the saturation point.

3 / 3

Progressive Disclosure

The skill references `examples/mock_openai_client.py` but no bundle files are provided, making this reference unverifiable. The content is monolithic — all 250+ lines are in a single file. The dashboard interpretation, troubleshooting, and mocking sections could be split into separate reference files to keep the main skill leaner.

2 / 3

Total

9

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
databricks/app-templates
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.