0
# Data Loading and Management
1
2
PyTorch-compatible data loading utilities optimized for medical imaging workflows, including specialized data loaders, queues for patch-based training, and efficient batch processing of medical image subjects.
3
4
## Capabilities
5
6
### Subjects Loader
7
8
Specialized PyTorch DataLoader wrapper optimized for medical image subjects, providing efficient loading and batching of Subject instances with medical imaging-specific collate functions.
9
10
```python { .api }
11
class SubjectsLoader(torch.utils.data.DataLoader):
12
"""
13
PyTorch DataLoader wrapper optimized for medical image subjects.
14
15
Provides custom collate function that handles the complex nested structure
16
of Subject instances containing multiple images and metadata.
17
"""
18
def __init__(
19
self,
20
dataset: SubjectsDataset,
21
batch_size: int = 1,
22
shuffle: bool = False,
23
num_workers: int = 0,
24
**kwargs
25
): ...
26
```
27
28
Usage example:
29
30
```python
31
import torchio as tio
32
import torch
33
34
# Create dataset
35
subjects = [...] # List of subjects
36
dataset = tio.SubjectsDataset(subjects)
37
38
# Create optimized loader for subjects
39
loader = tio.SubjectsLoader(
40
dataset,
41
batch_size=4,
42
shuffle=True,
43
num_workers=4
44
)
45
46
for batch in loader:
47
# batch is a dict containing batched subject data
48
t1_batch = batch['t1'][tio.DATA] # Shape: (4, 1, D, H, W)
49
ages = batch['age'] # List of ages
50
```
51
52
### Queue for Patch-Based Training
53
54
Queue implementation for patch-based training that maintains a buffer of patches extracted from subjects using specified sampling strategies. Essential for training on large 3D medical volumes that don't fit in memory.
55
56
```python { .api }
57
class Queue(torch.utils.data.Dataset):
58
"""
59
Queue for patch-based training with data augmentation.
60
61
Maintains a buffer of patches extracted from subjects, enabling
62
efficient training on large 3D volumes through patch-based sampling.
63
64
Parameters:
65
- subjects_dataset: SubjectsDataset containing subjects
66
- max_length: Maximum number of patches to keep in queue
67
- samples_per_volume: Number of patches to extract per subject
68
- sampler: PatchSampler for patch extraction strategy
69
- num_workers: Number of workers for parallel patch extraction
70
- shuffle_subjects: Whether to shuffle subjects
71
- shuffle_patches: Whether to shuffle patches
72
"""
73
def __init__(
74
self,
75
subjects_dataset: SubjectsDataset,
76
max_length: int,
77
samples_per_volume: int,
78
sampler: 'PatchSampler',
79
num_workers: int = 0,
80
shuffle_subjects: bool = True,
81
shuffle_patches: bool = True,
82
**kwargs
83
): ...
84
85
def __len__(self) -> int:
86
"""Return current number of patches in queue"""
87
88
def __getitem__(self, index: int) -> dict:
89
"""Get patch at index"""
90
91
def set_max_length(self, max_length: int):
92
"""Update maximum queue length"""
93
```
94
95
Usage example:
96
97
```python
98
# Create subjects dataset
99
subjects = [...] # List of subjects
100
dataset = tio.SubjectsDataset(subjects, transform=preprocessing_transform)
101
102
# Define patch sampling
103
patch_size = 64
104
sampler = tio.data.UniformSampler(patch_size)
105
106
# Create queue for patch-based training
107
patches_queue = tio.Queue(
108
subjects_dataset=dataset,
109
max_length=300, # Keep 300 patches in queue
110
samples_per_volume=10, # Extract 10 patches per subject
111
sampler=sampler,
112
num_workers=4,
113
shuffle_subjects=True,
114
shuffle_patches=True,
115
)
116
117
# Use with PyTorch DataLoader
118
patches_loader = torch.utils.data.DataLoader(
119
patches_queue,
120
batch_size=16,
121
num_workers=0 # Queue already handles parallelization
122
)
123
124
# Training loop
125
for batch in patches_loader:
126
inputs = batch['t1'][tio.DATA] # Shape: (16, 1, 64, 64, 64)
127
targets = batch['seg'][tio.DATA] # Shape: (16, 1, 64, 64, 64)
128
# Train model with patches
129
```
130
131
### Batch Processing Utilities
132
133
Utility functions for handling batched medical image data, including custom collate functions and batch analysis tools.
134
135
```python { .api }
136
def history_collate(batch: Sequence, collate_transforms=True) -> dict:
137
"""
138
Custom collate function that preserves transform history.
139
140
Parameters:
141
- batch: Sequence of Subject instances
142
- collate_transforms: Whether to collate transform histories
143
144
Returns:
145
Collated batch dictionary
146
"""
147
148
def get_first_item(data_loader: torch.utils.data.DataLoader):
149
"""Get first item from data loader for inspection"""
150
151
def get_batch_images_and_size(batch: dict) -> tuple[list[str], int]:
152
"""
153
Extract image names and batch size from batch.
154
155
Returns:
156
Tuple of (list of image names, batch size)
157
"""
158
159
def get_subjects_from_batch(batch: dict) -> list:
160
"""Extract individual subjects from batched data"""
161
162
def add_images_from_batch(
163
images_dict: dict,
164
batch: dict,
165
batch_idx: int
166
):
167
"""Add images from batch to images dictionary"""
168
```
169
170
### Memory Management
171
172
Functions for efficient memory management when working with large medical image datasets.
173
174
```python { .api }
175
def get_torchio_cache_dir() -> Path:
176
"""Get TorchIO cache directory for temporary files"""
177
178
def create_dummy_dataset(
179
num_subjects: int,
180
size_range: tuple[int, int] = (10, 20),
181
directory: Path = None,
182
**kwargs
183
) -> SubjectsDataset:
184
"""
185
Create dummy dataset for testing and development.
186
187
Parameters:
188
- num_subjects: Number of subjects to create
189
- size_range: Range of image sizes
190
- directory: Directory to save dummy images
191
192
Returns:
193
SubjectsDataset with dummy subjects
194
"""
195
```
196
197
Usage example:
198
199
```python
200
# Create dummy dataset for testing
201
dummy_dataset = tio.utils.create_dummy_dataset(
202
num_subjects=10,
203
size_range=(20, 30),
204
directory=Path('/tmp/dummy_medical_data')
205
)
206
207
# Use dummy dataset for testing transforms or training loops
208
dummy_loader = tio.SubjectsLoader(dummy_dataset, batch_size=2)
209
210
for batch in dummy_loader:
211
# Test your code with dummy data
212
pass
213
```