Name: load-testing
Rating: 85 (1 reviews)
Author: databricks

load-testing

Load test a Databricks App to find its maximum QPS. Use when: (1) User says 'load test', 'benchmark', 'QPS', 'throughput', or 'performance test', (2) User wants to find how many queries per second their app can handle, (3) User wants to set up load testing scripts for their agent, (4) User wants to view load test results/dashboard.

Quality

81%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Securityby

Risky

Do not use without reviewing

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description that clearly defines its purpose, provides comprehensive trigger terms, and explicitly states both what the skill does and when it should be used. It follows the recommended pattern with a concise capability statement followed by a structured 'Use when' clause with multiple trigger scenarios. The description is well-scoped to a specific domain (Databricks App load testing) making it highly distinctive.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: load testing a Databricks App, finding maximum QPS, setting up load testing scripts, and viewing load test results/dashboard. These are clear, actionable capabilities.	3 / 3
Completeness	Clearly answers both 'what' (load test a Databricks App to find its maximum QPS) and 'when' with an explicit 'Use when:' clause listing four specific trigger scenarios.	3 / 3
Trigger Term Quality	Excellent coverage of natural trigger terms users would say: 'load test', 'benchmark', 'QPS', 'throughput', 'performance test', 'queries per second', 'load testing scripts', 'load test results'. These are all terms users would naturally use.	3 / 3
Distinctiveness Conflict Risk	Highly distinctive with a clear niche: load testing specifically for Databricks Apps. The combination of 'Databricks App', 'load test', 'QPS', and 'benchmark' creates a very specific domain unlikely to conflict with other skills.	3 / 3
	Total	12 / 12 Passed

Implementation

62%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a comprehensive load testing guide with excellent workflow structure and clear sequencing, but it suffers from being overly long for a single SKILL.md and critically lacks the actual script implementations (locustfile.py, run_load_test.py, dashboard_template.py) that are the core deliverables. The skill describes what to build rather than providing copy-paste-ready code for the most important artifacts, which significantly limits actionability.

Suggestions

Provide complete, executable implementations of locustfile.py, run_load_test.py, and dashboard_template.py either inline or as bundle files—these are the core deliverables and currently only have prose descriptions.

Split the content: keep Steps 1-4 in SKILL.md as a concise overview, and move the dashboard interpretation guide, troubleshooting table, and mocking details into separate referenced files.

Remove explanatory content Claude can infer (e.g., why mocking is useful, what Peak QPS means, what SSE is) to reduce token usage by ~30%.

Dimension	Reasoning	Score
Conciseness	The skill is quite long (~300 lines) and includes some information Claude could infer (e.g., explaining what mocking is useful for, what Peak QPS means). However, most content is domain-specific configuration details (DAB variables, Locust setup, OAuth) that Claude wouldn't inherently know, so it's not egregiously verbose—just could be tightened in several places.	2 / 3
Actionability	The skill provides concrete CLI commands, YAML configs, and directory structures, which is good. However, the three core scripts (locustfile.py, run_load_test.py, dashboard_template.py) are only described in prose rather than provided as executable code. Claude is told to 'create' these files but given only behavioral specifications, not actual implementations. The mock example is a 3-line snippet referencing a file that isn't in the bundle.	2 / 3
Workflow Clarity	The 5-step workflow is clearly sequenced with explicit validation checkpoints (verify apps are ACTIVE before proceeding, healthcheck before load test, check logs for 0 QPS). The 'What Happens During a Run' section explains the progression, and the troubleshooting table provides error recovery guidance. The gather-parameters step upfront ensures prerequisites are met.	3 / 3
Progressive Disclosure	The skill is a monolithic document with all content inline—the dashboard interpretation, troubleshooting, mocking guide, deployment matrix, and parameter reference could be split into separate files. It references `examples/mock_openai_client.py` but no bundle files are provided, making that reference unverifiable. The content would benefit from a concise overview with links to detailed sections.	2 / 3
	Total	9 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository: databricks/app-templates
Commit: dfeb4ac

Reviewed: about 18 hours ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.