or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

augmentation.mdcomposition.mdcore-data-structures.mddata-loading.mddatasets.mdindex.mdpreprocessing.mdsampling.mdutilities.md

data-loading.mddocs/

0

# Data Loading and Management

1

2

PyTorch-compatible data loading utilities optimized for medical imaging workflows, including specialized data loaders, queues for patch-based training, and efficient batch processing of medical image subjects.

3

4

## Capabilities

5

6

### Subjects Loader

7

8

Specialized PyTorch DataLoader wrapper optimized for medical image subjects, providing efficient loading and batching of Subject instances with medical imaging-specific collate functions.

9

10

```python { .api }

11

class SubjectsLoader(torch.utils.data.DataLoader):

12

"""

13

PyTorch DataLoader wrapper optimized for medical image subjects.

14

15

Provides custom collate function that handles the complex nested structure

16

of Subject instances containing multiple images and metadata.

17

"""

18

def __init__(

19

self,

20

dataset: SubjectsDataset,

21

batch_size: int = 1,

22

shuffle: bool = False,

23

num_workers: int = 0,

24

**kwargs

25

): ...

26

```

27

28

Usage example:

29

30

```python

31

import torchio as tio

32

import torch

33

34

# Create dataset

35

subjects = [...] # List of subjects

36

dataset = tio.SubjectsDataset(subjects)

37

38

# Create optimized loader for subjects

39

loader = tio.SubjectsLoader(

40

dataset,

41

batch_size=4,

42

shuffle=True,

43

num_workers=4

44

)

45

46

for batch in loader:

47

# batch is a dict containing batched subject data

48

t1_batch = batch['t1'][tio.DATA] # Shape: (4, 1, D, H, W)

49

ages = batch['age'] # List of ages

50

```

51

52

### Queue for Patch-Based Training

53

54

Queue implementation for patch-based training that maintains a buffer of patches extracted from subjects using specified sampling strategies. Essential for training on large 3D medical volumes that don't fit in memory.

55

56

```python { .api }

57

class Queue(torch.utils.data.Dataset):

58

"""

59

Queue for patch-based training with data augmentation.

60

61

Maintains a buffer of patches extracted from subjects, enabling

62

efficient training on large 3D volumes through patch-based sampling.

63

64

Parameters:

65

- subjects_dataset: SubjectsDataset containing subjects

66

- max_length: Maximum number of patches to keep in queue

67

- samples_per_volume: Number of patches to extract per subject

68

- sampler: PatchSampler for patch extraction strategy

69

- num_workers: Number of workers for parallel patch extraction

70

- shuffle_subjects: Whether to shuffle subjects

71

- shuffle_patches: Whether to shuffle patches

72

"""

73

def __init__(

74

self,

75

subjects_dataset: SubjectsDataset,

76

max_length: int,

77

samples_per_volume: int,

78

sampler: 'PatchSampler',

79

num_workers: int = 0,

80

shuffle_subjects: bool = True,

81

shuffle_patches: bool = True,

82

**kwargs

83

): ...

84

85

def __len__(self) -> int:

86

"""Return current number of patches in queue"""

87

88

def __getitem__(self, index: int) -> dict:

89

"""Get patch at index"""

90

91

def set_max_length(self, max_length: int):

92

"""Update maximum queue length"""

93

```

94

95

Usage example:

96

97

```python

98

# Create subjects dataset

99

subjects = [...] # List of subjects

100

dataset = tio.SubjectsDataset(subjects, transform=preprocessing_transform)

101

102

# Define patch sampling

103

patch_size = 64

104

sampler = tio.data.UniformSampler(patch_size)

105

106

# Create queue for patch-based training

107

patches_queue = tio.Queue(

108

subjects_dataset=dataset,

109

max_length=300, # Keep 300 patches in queue

110

samples_per_volume=10, # Extract 10 patches per subject

111

sampler=sampler,

112

num_workers=4,

113

shuffle_subjects=True,

114

shuffle_patches=True,

115

)

116

117

# Use with PyTorch DataLoader

118

patches_loader = torch.utils.data.DataLoader(

119

patches_queue,

120

batch_size=16,

121

num_workers=0 # Queue already handles parallelization

122

)

123

124

# Training loop

125

for batch in patches_loader:

126

inputs = batch['t1'][tio.DATA] # Shape: (16, 1, 64, 64, 64)

127

targets = batch['seg'][tio.DATA] # Shape: (16, 1, 64, 64, 64)

128

# Train model with patches

129

```

130

131

### Batch Processing Utilities

132

133

Utility functions for handling batched medical image data, including custom collate functions and batch analysis tools.

134

135

```python { .api }

136

def history_collate(batch: Sequence, collate_transforms=True) -> dict:

137

"""

138

Custom collate function that preserves transform history.

139

140

Parameters:

141

- batch: Sequence of Subject instances

142

- collate_transforms: Whether to collate transform histories

143

144

Returns:

145

Collated batch dictionary

146

"""

147

148

def get_first_item(data_loader: torch.utils.data.DataLoader):

149

"""Get first item from data loader for inspection"""

150

151

def get_batch_images_and_size(batch: dict) -> tuple[list[str], int]:

152

"""

153

Extract image names and batch size from batch.

154

155

Returns:

156

Tuple of (list of image names, batch size)

157

"""

158

159

def get_subjects_from_batch(batch: dict) -> list:

160

"""Extract individual subjects from batched data"""

161

162

def add_images_from_batch(

163

images_dict: dict,

164

batch: dict,

165

batch_idx: int

166

):

167

"""Add images from batch to images dictionary"""

168

```

169

170

### Memory Management

171

172

Functions for efficient memory management when working with large medical image datasets.

173

174

```python { .api }

175

def get_torchio_cache_dir() -> Path:

176

"""Get TorchIO cache directory for temporary files"""

177

178

def create_dummy_dataset(

179

num_subjects: int,

180

size_range: tuple[int, int] = (10, 20),

181

directory: Path = None,

182

**kwargs

183

) -> SubjectsDataset:

184

"""

185

Create dummy dataset for testing and development.

186

187

Parameters:

188

- num_subjects: Number of subjects to create

189

- size_range: Range of image sizes

190

- directory: Directory to save dummy images

191

192

Returns:

193

SubjectsDataset with dummy subjects

194

"""

195

```

196

197

Usage example:

198

199

```python

200

# Create dummy dataset for testing

201

dummy_dataset = tio.utils.create_dummy_dataset(

202

num_subjects=10,

203

size_range=(20, 30),

204

directory=Path('/tmp/dummy_medical_data')

205

)

206

207

# Use dummy dataset for testing transforms or training loops

208

dummy_loader = tio.SubjectsLoader(dummy_dataset, batch_size=2)

209

210

for batch in dummy_loader:

211

# Test your code with dummy data

212

pass

213

```