Generate realistic synthetic data using Spark + Faker (strongly recommended). Supports serverless execution, multiple output formats (Parquet/JSON/CSV/Delta), and scales from thousands to millions of rows. For small datasets (<10K rows), can optionally generate locally and upload to volumes. Use when user mentions 'synthetic data', 'test data', 'generate data', 'demo dataset', 'Faker', or 'sample data'.
94
92%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that hits all the marks. It provides specific capabilities (formats, scale, execution modes), uses third-person voice throughout, and includes an explicit 'Use when...' clause with well-chosen natural trigger terms. The description is concise yet comprehensive, clearly distinguishing this skill from other data-related skills.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions and capabilities: generate synthetic data using Spark + Faker, serverless execution, multiple output formats (Parquet/JSON/CSV/Delta), scaling from thousands to millions of rows, and local generation with upload to volumes for small datasets. | 3 / 3 |
Completeness | Clearly answers both 'what does this do' (generate realistic synthetic data using Spark + Faker with multiple formats and scale options) AND 'when should Claude use it' with an explicit 'Use when...' clause listing specific trigger terms. | 3 / 3 |
Trigger Term Quality | Includes a strong set of natural trigger terms users would actually say: 'synthetic data', 'test data', 'generate data', 'demo dataset', 'Faker', 'sample data'. These cover common variations of how users would phrase their need. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive with a clear niche: synthetic/test data generation specifically using Spark + Faker. The combination of technology stack (Spark, Faker), output formats, and specific trigger terms makes it unlikely to conflict with other skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
85%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a strong, well-structured skill that provides clear actionable guidance for synthetic data generation on Databricks. Its main strengths are excellent progressive disclosure with well-signaled references, fully executable code examples, and a thorough planning workflow with validation checkpoints. The primary weakness is some redundancy across sections (repeated UDF code, repeated catalog/schema reminders) that could be tightened to improve token efficiency.
Suggestions
Remove the duplicate generate_amount UDF from the Quick Start section since it's already in Common Patterns, or consolidate into a single location with a cross-reference.
Consolidate the repeated 'ask for catalog/schema' instruction — it appears in Critical Rules, the planning workflow header, Step 1, and Best Practices. State it once prominently and reference it.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is fairly well-organized but includes some redundancy — the Quick Start code repeats the generate_amount UDF that also appears in Common Patterns, the planning workflow is quite verbose with multiple checklists, and some instructions (like 'ask for catalog/schema') are repeated across Critical Rules, Planning Workflow, and Best Practices. However, it mostly avoids explaining concepts Claude already knows. | 2 / 3 |
Actionability | The skill provides fully executable code examples with complete Spark + Faker + Pandas UDF patterns, specific SQL commands for infrastructure creation, concrete distribution parameters (log-normal means/stds), and copy-paste ready code snippets for common patterns like weighted distributions and date ranges. | 3 / 3 |
Workflow Clarity | The generation planning workflow is clearly sequenced with explicit steps (gather requirements → present spec → confirm → generate → validate), includes pre-generation and post-generation checklists, and has clear validation checkpoints including user approval gates before proceeding. The critical rules about .cache()/.persist() and the master-tables-first ordering provide important guardrails. | 3 / 3 |
Progressive Disclosure | Excellent progressive disclosure with a clear Quick Reference table linking to 6 separate reference files and a script, each with a 'When to Use' description. The main skill provides a concise overview with actionable quick-start content while deferring detailed setup, troubleshooting, and domain guidance to one-level-deep references. | 3 / 3 |
Total | 11 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
b4071a0
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.