Generate realistic synthetic data using Spark + Faker (strongly recommended). Supports serverless execution, multiple output formats (Parquet/JSON/CSV/Delta), and scales from thousands to millions of rows. For small datasets (<10K rows), can optionally generate locally and upload to volumes. Use when user mentions 'synthetic data', 'test data', 'generate data', 'demo dataset', 'Faker', or 'sample data'.
72
88%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that hits all the key criteria. It provides specific capabilities, includes a comprehensive set of natural trigger terms in an explicit 'Use when...' clause, and occupies a clearly distinct niche. The description is concise yet thorough, covering the what, how, and when effectively.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple concrete capabilities: generating synthetic data using Spark + Faker, serverless execution, multiple output formats (Parquet/JSON/CSV/Delta), scaling from thousands to millions of rows, and local generation with upload to volumes for small datasets. | 3 / 3 |
Completeness | Clearly answers both 'what does this do' (generate realistic synthetic data with Spark + Faker, multiple formats, scalable) AND 'when should Claude use it' with an explicit 'Use when...' clause listing specific trigger terms. | 3 / 3 |
Trigger Term Quality | Includes a strong set of natural trigger terms users would actually say: 'synthetic data', 'test data', 'generate data', 'demo dataset', 'Faker', 'sample data'. Also includes technical terms like Parquet, JSON, CSV, Delta, and Spark that users working in this domain would mention. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive with a clear niche: synthetic/test data generation using Spark + Faker with specific output formats. The combination of data generation, Faker, and Spark-based execution creates a unique profile unlikely to conflict with other skills like general data processing or analytics. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a strong, highly actionable skill with excellent workflow clarity and concrete executable patterns. Its main weakness is moderate verbosity — the catalog confirmation is repeated excessively, and the business story philosophy section could be more concise. The progressive disclosure is adequate but the main file carries a lot of inline content that could be offloaded to references.
Suggestions
Reduce redundancy around catalog confirmation — it's stated in the critical rules, the workflow steps, the pre-generation checklist, and the plan template. Consolidate to one authoritative location.
Consider moving the detailed plan example (Step 2) and Common Patterns section into reference files to keep the main SKILL.md leaner as an overview.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is fairly long (~250 lines) with some redundancy — the 'Critical Rules' section repeats points already covered in the workflow and code sections (e.g., catalog confirmation is stated 3+ times, no .cache() is mentioned in rules, code comments, and troubleshooting). The 'Data Must Tell a Business Story' section is somewhat verbose for Claude. However, most content is genuinely instructive and domain-specific, not explaining things Claude already knows. | 2 / 3 |
Actionability | The skill provides fully executable code patterns (Spark + Faker + Pandas UDFs), specific partition sizing guidance, concrete anti-pattern tables with alternatives, copy-paste ready infrastructure setup commands, and detailed plan templates. The FK pattern, weighted categories, and log-normal amounts are all directly usable. | 3 / 3 |
Workflow Clarity | The workflow is clearly sequenced (gather requirements → present plan → confirm → generate → validate) with explicit validation checkpoints including pre-generation and post-generation checklists. The plan template includes a concrete approval gate ('Do NOT proceed to code generation until user approves'), and the post-generation checklist uses a specific tool (get_volume_folder_details) for verification. | 3 / 3 |
Progressive Disclosure | The skill references two external files (1-data-patterns.md and 2-troubleshooting.md) with clear signaling in a table, which is good. However, the main SKILL.md itself is quite long and could benefit from moving some content (e.g., the detailed plan example, common patterns section) into reference files. The bundle files weren't provided, so we can't verify the references exist, but the structure is reasonable for a skill of this complexity. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
93cb4e3
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.