Tessl Tile for pypi/torchio@0.20.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

augmentation.md composition.md core-data-structures.md data-loading.md datasets.md index.md preprocessing.md sampling.md utilities.md

data-loading.mddocs/

0
# Data Loading and Management
1

2
PyTorch-compatible data loading utilities optimized for medical imaging workflows, including specialized data loaders, queues for patch-based training, and efficient batch processing of medical image subjects.
3

4
## Capabilities
5

6
### Subjects Loader
7

8
Specialized PyTorch DataLoader wrapper optimized for medical image subjects, providing efficient loading and batching of Subject instances with medical imaging-specific collate functions.
9

10
```python { .api }
11
class SubjectsLoader(torch.utils.data.DataLoader):
12
    """
13
    PyTorch DataLoader wrapper optimized for medical image subjects.
14
    
15
    Provides custom collate function that handles the complex nested structure
16
    of Subject instances containing multiple images and metadata.
17
    """
18
    def __init__(
19
        self,
20
        dataset: SubjectsDataset,
21
        batch_size: int = 1,
22
        shuffle: bool = False,
23
        num_workers: int = 0,
24
        **kwargs
25
    ): ...
26
```
27

28
Usage example:
29

30
```python
31
import torchio as tio
32
import torch
33

34
# Create dataset
35
subjects = [...]  # List of subjects
36
dataset = tio.SubjectsDataset(subjects)
37

38
# Create optimized loader for subjects
39
loader = tio.SubjectsLoader(
40
    dataset,
41
    batch_size=4,
42
    shuffle=True,
43
    num_workers=4
44
)
45

46
for batch in loader:
47
    # batch is a dict containing batched subject data
48
    t1_batch = batch['t1'][tio.DATA]  # Shape: (4, 1, D, H, W)
49
    ages = batch['age']  # List of ages
50
```
51

52
### Queue for Patch-Based Training
53

54
Queue implementation for patch-based training that maintains a buffer of patches extracted from subjects using specified sampling strategies. Essential for training on large 3D medical volumes that don't fit in memory.
55

56
```python { .api }
57
class Queue(torch.utils.data.Dataset):
58
    """
59
    Queue for patch-based training with data augmentation.
60
    
61
    Maintains a buffer of patches extracted from subjects, enabling
62
    efficient training on large 3D volumes through patch-based sampling.
63
    
64
    Parameters:
65
    - subjects_dataset: SubjectsDataset containing subjects
66
    - max_length: Maximum number of patches to keep in queue
67
    - samples_per_volume: Number of patches to extract per subject
68
    - sampler: PatchSampler for patch extraction strategy
69
    - num_workers: Number of workers for parallel patch extraction
70
    - shuffle_subjects: Whether to shuffle subjects
71
    - shuffle_patches: Whether to shuffle patches
72
    """
73
    def __init__(
74
        self,
75
        subjects_dataset: SubjectsDataset,
76
        max_length: int,
77
        samples_per_volume: int,
78
        sampler: 'PatchSampler',
79
        num_workers: int = 0,
80
        shuffle_subjects: bool = True,
81
        shuffle_patches: bool = True,
82
        **kwargs
83
    ): ...
84
    
85
    def __len__(self) -> int:
86
        """Return current number of patches in queue"""
87
    
88
    def __getitem__(self, index: int) -> dict:
89
        """Get patch at index"""
90
    
91
    def set_max_length(self, max_length: int):
92
        """Update maximum queue length"""
93
```
94

95
Usage example:
96

97
```python
98
# Create subjects dataset
99
subjects = [...]  # List of subjects
100
dataset = tio.SubjectsDataset(subjects, transform=preprocessing_transform)
101

102
# Define patch sampling
103
patch_size = 64
104
sampler = tio.data.UniformSampler(patch_size)
105

106
# Create queue for patch-based training
107
patches_queue = tio.Queue(
108
    subjects_dataset=dataset,
109
    max_length=300,  # Keep 300 patches in queue
110
    samples_per_volume=10,  # Extract 10 patches per subject
111
    sampler=sampler,
112
    num_workers=4,
113
    shuffle_subjects=True,
114
    shuffle_patches=True,
115
)
116

117
# Use with PyTorch DataLoader
118
patches_loader = torch.utils.data.DataLoader(
119
    patches_queue,
120
    batch_size=16,
121
    num_workers=0  # Queue already handles parallelization
122
)
123

124
# Training loop
125
for batch in patches_loader:
126
    inputs = batch['t1'][tio.DATA]  # Shape: (16, 1, 64, 64, 64)
127
    targets = batch['seg'][tio.DATA]  # Shape: (16, 1, 64, 64, 64)
128
    # Train model with patches
129
```
130

131
### Batch Processing Utilities
132

133
Utility functions for handling batched medical image data, including custom collate functions and batch analysis tools.
134

135
```python { .api }
136
def history_collate(batch: Sequence, collate_transforms=True) -> dict:
137
    """
138
    Custom collate function that preserves transform history.
139
    
140
    Parameters:
141
    - batch: Sequence of Subject instances
142
    - collate_transforms: Whether to collate transform histories
143
    
144
    Returns:
145
    Collated batch dictionary
146
    """
147

148
def get_first_item(data_loader: torch.utils.data.DataLoader):
149
    """Get first item from data loader for inspection"""
150

151
def get_batch_images_and_size(batch: dict) -> tuple[list[str], int]:
152
    """
153
    Extract image names and batch size from batch.
154
    
155
    Returns:
156
    Tuple of (list of image names, batch size)
157
    """
158

159
def get_subjects_from_batch(batch: dict) -> list:
160
    """Extract individual subjects from batched data"""
161

162
def add_images_from_batch(
163
    images_dict: dict,
164
    batch: dict,
165
    batch_idx: int
166
):
167
    """Add images from batch to images dictionary"""
168
```
169

170
### Memory Management
171

172
Functions for efficient memory management when working with large medical image datasets.
173

174
```python { .api }
175
def get_torchio_cache_dir() -> Path:
176
    """Get TorchIO cache directory for temporary files"""
177

178
def create_dummy_dataset(
179
    num_subjects: int,
180
    size_range: tuple[int, int] = (10, 20),
181
    directory: Path = None,
182
    **kwargs
183
) -> SubjectsDataset:
184
    """
185
    Create dummy dataset for testing and development.
186
    
187
    Parameters:
188
    - num_subjects: Number of subjects to create
189
    - size_range: Range of image sizes
190
    - directory: Directory to save dummy images
191
    
192
    Returns:
193
    SubjectsDataset with dummy subjects
194
    """
195
```
196

197
Usage example:
198

199
```python
200
# Create dummy dataset for testing
201
dummy_dataset = tio.utils.create_dummy_dataset(
202
    num_subjects=10,
203
    size_range=(20, 30),
204
    directory=Path('/tmp/dummy_medical_data')
205
)
206

207
# Use dummy dataset for testing transforms or training loops
208
dummy_loader = tio.SubjectsLoader(dummy_dataset, batch_size=2)
209

210
for batch in dummy_loader:
211
    # Test your code with dummy data
212
    pass
213
```

Version

Tile

Files

data-loading.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

data-loading.mddocs/