Clean and standardize clinical trial data to CDISC SDTM standards for FDA/EMA regulatory submissions. Handles missing values, outlier detection, date standardization, and generates audit trails for DM, LB, and VS domains.
92
91%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Clean, validate, and standardize clinical trial data to meet CDISC SDTM standards for regulatory submissions to FDA or EMA.
Key Capabilities:
This skill accepts: a CSV file containing clinical trial data, a domain identifier (DM, LB, or VS), and optional cleaning strategy parameters.
If the request does not involve cleaning or standardizing clinical trial data for CDISC SDTM compliance — for example, asking to analyze genomic data, perform statistical modeling, or interpret clinical results — do not proceed. Instead respond:
"Clinical Data Cleaner is designed to clean and standardize clinical trial data for CDISC SDTM regulatory submissions. Please provide a CSV input file and domain (DM, LB, or VS). For other data analysis tasks, use a more appropriate tool."
python -m py_compile scripts/main.py
python scripts/main.py --helpFallback: If --input, --domain, or --output is missing, respond: "Required parameters missing. Please provide --input (CSV file), --domain (DM, LB, or VS), and --output (output path). Cannot clean without all three."
from scripts.main import ClinicalDataCleaner
cleaner = ClinicalDataCleaner(domain='DM')
is_valid, missing = cleaner.validate_domain(data)Required Fields:
cleaner = ClinicalDataCleaner(domain='DM', missing_strategy='median')
cleaned = cleaner.handle_missing_values(data)Strategies: mean, median, mode, forward, drop
cleaner = ClinicalDataCleaner(domain='LB', outlier_method='domain', outlier_action='flag')
flagged = cleaner.detect_outliers(data)Clinical Thresholds:
| Parameter | Range | Unit |
|---|---|---|
| Glucose | 50–500 | mg/dL |
| Hemoglobin | 5–20 | g/dL |
| Systolic BP | 70–220 | mmHg |
standardized = cleaner.standardize_dates(data)
# Converts to ISO 8601: 2023-01-15T09:30:00cleaner = ClinicalDataCleaner(
domain='DM', missing_strategy='median',
outlier_method='iqr', outlier_action='flag'
)
cleaned_data = cleaner.clean(data)
cleaner.save_report('output.csv')
# Outputs: output.csv + output.report.json (audit trail)# Clean demographics domain
python scripts/main.py \
--input dm_raw.csv \
--domain DM \
--output dm_clean.csv \
--missing-strategy median \
--outlier-method iqr \
--outlier-action flag
# Clean lab data with clinical thresholds
python scripts/main.py \
--input lb_raw.csv \
--domain LB \
--output lb_clean.csv \
--outlier-method domain| Parameter | Type | Required | Description |
|---|---|---|---|
--input | string | Yes | Input CSV file path |
--domain | string | Yes | SDTM domain (DM, LB, VS) |
--output | string | Yes | Output CSV file path |
--missing-strategy | string | No | Missing value strategy |
--outlier-method | string | No | Outlier detection method |
--outlier-action | string | No | Outlier action (flag, remove, cap) |
Every final response must make these explicit:
--input, --domain, or --output is missing, state the missing parameters and request them.scripts/main.py fails, report the failure point and provide manual fallback guidance.Pre-Cleaning:
Post-Cleaning:
references/sdtm_ig_guide.md — CDISC SDTM Implementation Guidereferences/domain_specs.json — Domain-specific field requirementsreferences/outlier_thresholds.json — Clinical outlier thresholdsreferences/common-patterns.md — Detailed usage patternsreferences/troubleshooting.md — Problem-solving guideca9aaa4
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.