Therapeutics Data Commons. AI-ready drug discovery datasets (ADME, toxicity, DTI), benchmarks, scaffold splits, molecular oracles, for therapeutic ML and pharmacological prediction.
67
52%
Does it follow best practices?
Impact
91%
1.16xAverage score across 3 eval scenarios
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./scientific-skills/pytdc/SKILL.mdQuality
Discovery
54%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description effectively identifies a clear, distinctive niche in drug discovery ML and includes strong domain-specific trigger terms. However, it lacks concrete action verbs describing what the skill does (it reads like a feature tagline rather than a capability description) and critically omits any 'Use when...' guidance, significantly hurting completeness.
Suggestions
Add an explicit 'Use when...' clause, e.g., 'Use when the user asks about drug discovery datasets, ADME/toxicity predictions, molecular property benchmarks, or Therapeutics Data Commons (TDC).'
Replace the noun-heavy feature list with concrete action verbs, e.g., 'Loads and processes drug discovery datasets from TDC, runs ADME and toxicity benchmarks, generates scaffold splits, and evaluates molecules using molecular oracles.'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names the domain (drug discovery) and lists several specific data types (ADME, toxicity, DTI) and tools (scaffold splits, molecular oracles), but these read more like a feature list than concrete actions. It doesn't describe what actions the skill performs (e.g., 'downloads datasets', 'runs benchmarks', 'generates splits'). | 2 / 3 |
Completeness | The description answers 'what' at a high level (AI-ready drug discovery datasets and benchmarks) but completely lacks a 'Use when...' clause or any explicit trigger guidance for when Claude should select this skill. Per the rubric, a missing 'Use when...' clause caps completeness at 2, and the 'what' is also weak on concrete actions, so this scores a 1. | 1 / 3 |
Trigger Term Quality | Strong coverage of natural keywords a user in this domain would use: 'drug discovery', 'ADME', 'toxicity', 'DTI', 'scaffold splits', 'molecular oracles', 'therapeutic ML', 'pharmacological prediction'. These are terms a researcher would naturally mention. | 3 / 3 |
Distinctiveness Conflict Risk | The description targets a very specific niche—Therapeutics Data Commons for drug discovery ML—with highly domain-specific terms like ADME, DTI, scaffold splits, and molecular oracles. This is unlikely to conflict with other skills. | 3 / 3 |
Total | 9 / 12 Passed |
Implementation
50%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill provides excellent actionable code examples covering the full TDC API surface, but is far too verbose for a skill file—it reads more like library documentation than a concise skill. Extensive enumeration of datasets, task categories, and column descriptions bloats the content without adding value Claude couldn't infer. The workflow sections are weak, delegating to external scripts without showing complete inline steps or validation checkpoints.
Suggestions
Cut the content by 50-60%: remove dataset enumerations, 'When to Use' section, data format descriptions, and task category explanations—move these to references/datasets.md instead.
Inline the actual workflow steps from the referenced scripts rather than just pointing to them, and add validation checkpoints (e.g., verify data loaded correctly, check split sizes).
Remove explanatory text about what ADME, toxicity, DTI etc. are—Claude already knows these domain concepts.
Consolidate the Quick Start and Common Workflows sections into a single concise section showing the 2-3 most common patterns end-to-end.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is extremely verbose at ~300+ lines, extensively listing dataset names, column descriptions, and task categories that Claude could easily look up or infer. Sections like 'When to Use This Skill' and explanations of what ADME/toxicity/DTI are add little value. The exhaustive enumeration of datasets and task types is catalog-like padding. | 1 / 3 |
Actionability | The skill provides fully executable, copy-paste ready Python code examples throughout, including dataset loading, splitting, evaluation, oracle usage, and format conversion. The code patterns are concrete and immediately usable. | 3 / 3 |
Workflow Clarity | Workflows are listed but mostly delegate to external scripts ('See scripts/benchmark_evaluation.py') rather than showing the actual steps inline. The benchmark workflow mentions 5-seed protocol but the inline code is incomplete with commented-out model training. No validation checkpoints or error recovery steps are provided. | 2 / 3 |
Progressive Disclosure | References to external files (references/oracles.md, scripts/*.py, references/utilities.md) are present and one-level deep, which is good. However, the main file itself contains too much inline content that should be in reference files (e.g., exhaustive dataset listings, all task categories), undermining the progressive disclosure pattern. | 2 / 3 |
Total | 8 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
metadata_version | 'metadata.version' is missing | Warning |
Total | 10 / 11 Passed | |
25e1c0f
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.