Mine unstructured clinical text from MIMIC-IV to extract diagnostic logic and treatment details
50
31%
Does it follow best practices?
Impact
76%
2.00xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./scientific-skills/Data analysis/unstructured-medical-text-miner/SKILL.mdMine "text data" that has been long overlooked in MIMIC-IV, extracting unstructured diagnostic logic, order details, and progress notes.
The MIMIC-IV database contains large amounts of structured data (vital signs, laboratory results, etc.), but its true clinical value is often hidden in unstructured text:
This Skill provides a complete text mining toolchain to transform raw medical text into analyzable structured insights.
from skills.unstructured_medical_text_miner.scripts.main import MedicalTextMiner
# Initialize miner
miner = MedicalTextMiner()
# Load MIMIC-IV note data
miner.load_notes(notes_path="path/to/noteevents.csv")
# Extract all text records for a specific patient
patient_texts = miner.get_patient_texts(subject_id=10000032)
# Execute complete information extraction
insights = miner.extract_insights(
text=patient_texts,
extract_entities=True,
extract_relations=True,
extract_timeline=True
)| Field Name | Description | Required |
|---|---|---|
| subject_id | Patient unique identifier | Yes |
| hadm_id | Hospital admission record identifier | No |
| note_type | Note type (DS/RR/ECG, etc.) | Yes |
| note_text | Note text content | Yes |
| charttime | Record time | No |
{
"entities": [
{
"text": "acute myocardial infarction",
"type": "DISEASE",
"start": 156,
"end": 183,
"confidence": 0.94
},
{
"text": "aspirin 81mg",
"type": "MEDICATION",
"start": 245,
"end": 257,
"attributes": {
"dose": "81mg",
"frequency": "daily"
}
}
]
}{
"clinical_logic": {
"presenting_complaint": "chest pain",
"differential_diagnoses": ["ACS", "PE", "aortic dissection"],
"workup": ["ECG", "troponin", "CTA chest"],
"final_diagnosis": "STEMI",
"treatment_plan": ["PCI", "dual antiplatelet"]
}
}{
"timeline": [
{
"time": "2020-03-15 08:30",
"event": "admission",
"description": "presented with chest pain"
},
{
"time": "2020-03-15 09:15",
"event": "ECG",
"description": "ST elevation in V1-V4"
}
]
}pandas>=1.3.0
spacy>=3.4.0
scispacy>=0.5.1
radlex (for radiology terminology)
negspacy (for negation detection)# config.yaml
extraction:
entity_types: ["DISEASE", "SYMPTOM", "MEDICATION", "PROCEDURE", "ANATOMY"]
relation_types: ["TREATS", "CAUSES", "CONTRAINDICATED_WITH"]
enable_negation_detection: true
models:
ner_model: "en_core_sci_lg" # or "en_core_sci_scibert"
relation_model: "custom_relation_extractor"
output:
format: "json" # json/fhir/kg
include_raw_text: false# Process single file
python -m skills.unstructured_medical_text_miner.scripts.main \
--input notes.csv \
--output extracted.json \
--extract all
# Process specific patient
python -m skills.unstructured_medical_text_miner.scripts.main \
--subject-id 10000032 \
--db-path mimic_iv.db \
--output patient_insights.jsonSkill ID: 213 Category: Medical Data Mining Complexity: Advanced
| Risk Indicator | Assessment | Level |
|---|---|---|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
# Python dependencies
pip install -r requirements.txtca9aaa4
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.