0
# Sample Datasets
1
2
Built-in access to standard neuroimaging datasets for testing, tutorials, and benchmarking. MNE-Python provides easy access to over 20 different datasets covering various experimental paradigms and recording modalities.
3
4
## Capabilities
5
6
### Core Datasets
7
8
Standard datasets used in tutorials and examples throughout the MNE documentation.
9
10
```python { .api }
11
def data_path(path: Optional[str] = None, force_update: bool = False, update_path: bool = True,
12
download: bool = True, accept: bool = False, verbose: Optional[Union[bool, str, int]] = None) -> str:
13
"""
14
Generic dataset path function (pattern used by all datasets).
15
16
Parameters:
17
- path: Custom download path
18
- force_update: Force redownload of data
19
- update_path: Update MNE config with path
20
- download: Download if missing
21
- accept: Accept license terms
22
- verbose: Verbosity level
23
24
Returns:
25
Path to dataset directory
26
"""
27
28
# Sample Dataset - Auditory/Visual Paradigm
29
sample.data_path: Callable[..., str] # Download sample dataset
30
sample.get_version: Callable[[], str] # Get dataset version
31
32
# Somatosensory Dataset
33
somato.data_path: Callable[..., str] # Somatosensory MEG data
34
somato.get_version: Callable[[], str]
35
36
# Multimodal Dataset
37
multimodal.data_path: Callable[..., str] # Multimodal face dataset
38
multimodal.get_version: Callable[[], str]
39
40
# SPM Face Dataset
41
spm_face.data_path: Callable[..., str] # SPM face processing dataset
42
spm_face.get_version: Callable[[], str]
43
```
44
45
### Motor Imagery and BCI Datasets
46
47
Datasets for brain-computer interface research and motor imagery classification.
48
49
```python { .api }
50
# EEG Motor Movement/Imagery Dataset
51
eegbci.data_path: Callable[..., str]
52
eegbci.get_version: Callable[[], str]
53
54
def load_data(subject: int, runs: Union[int, List[int]], path: Optional[str] = None,
55
force_update: bool = False, update_path: bool = True,
56
base_url: str = 'https://physionet.org/files/eegmmidb/',
57
verbose: Optional[Union[bool, str, int]] = None) -> List[str]:
58
"""
59
Load EEGBCI dataset files.
60
61
Parameters:
62
- subject: Subject number (1-109)
63
- runs: Run number(s) to load
64
- path: Download path
65
- force_update: Force redownload
66
- update_path: Update MNE config
67
- base_url: Base download URL
68
- verbose: Verbosity level
69
70
Returns:
71
List of paths to downloaded files
72
"""
73
74
# SSVEP Dataset
75
ssvep.data_path: Callable[..., str] # Steady-state visual evoked potentials
76
ssvep.get_version: Callable[[], str]
77
78
def load_data(path: Optional[str] = None, force_update: bool = False,
79
update_path: bool = True, verbose: Optional[Union[bool, str, int]] = None) -> Dict:
80
"""
81
Load SSVEP dataset.
82
83
Returns:
84
Dictionary with loaded epochs and metadata
85
"""
86
```
87
88
### Sleep and Physiology Datasets
89
90
Datasets for sleep research and physiological signal analysis.
91
92
```python { .api }
93
# Sleep Physiology Dataset
94
sleep_physionet.data_path: Callable[..., str]
95
sleep_physionet.get_version: Callable[[], str]
96
97
def age_group_averages(path: Optional[str] = None, verbose: Optional[Union[bool, str, int]] = None) -> List[str]:
98
"""
99
Load age group average data.
100
101
Parameters:
102
- path: Dataset path
103
- verbose: Verbosity level
104
105
Returns:
106
List of paths to age group files
107
"""
108
109
def temazepam_effects(path: Optional[str] = None, verbose: Optional[Union[bool, str, int]] = None) -> List[str]:
110
"""
111
Load temazepam effects data.
112
113
Returns:
114
List of paths to temazepam study files
115
"""
116
```
117
118
### Specialized Neuroimaging Datasets
119
120
Datasets for specific analysis methods and experimental paradigms.
121
122
```python { .api }
123
# High-Frequency SEF Dataset
124
hf_sef.data_path: Callable[..., str] # High-frequency somatosensory evoked fields
125
hf_sef.get_version: Callable[[], str]
126
127
# Epilepsy ECoG Dataset
128
epilepsy_ecog.data_path: Callable[..., str] # Intracranial EEG epilepsy data
129
epilepsy_ecog.get_version: Callable[[], str]
130
131
# fNIRS Motor Task Dataset
132
fnirs_motor.data_path: Callable[..., str] # Functional near-infrared spectroscopy
133
fnirs_motor.get_version: Callable[[], str]
134
135
# OPM Dataset
136
opm.data_path: Callable[..., str] # Optically pumped magnetometer data
137
opm.get_version: Callable[[], str]
138
139
# Visual Categorization Dataset
140
visual_92_categories.data_path: Callable[..., str] # Visual object categorization
141
visual_92_categories.get_version: Callable[[], str]
142
143
def load_data(path: Optional[str] = None, verbose: Optional[Union[bool, str, int]] = None) -> Tuple[ArrayLike, ArrayLike]:
144
"""
145
Load visual categorization data.
146
147
Returns:
148
Tuple of (data_array, labels)
149
"""
150
151
# Kiloword Dataset
152
kiloword.data_path: Callable[..., str] # Lexical decision task
153
kiloword.get_version: Callable[[], str]
154
155
def load_data(path: Optional[str] = None, verbose: Optional[Union[bool, str, int]] = None) -> Dict:
156
"""
157
Load kiloword dataset.
158
159
Returns:
160
Dictionary with epochs and metadata
161
"""
162
```
163
164
### Connectivity and Network Datasets
165
166
Datasets for studying brain connectivity and network analysis.
167
168
```python { .api }
169
# FieldTrip CMC Dataset
170
fieldtrip_cmc.data_path: Callable[..., str] # Cortico-muscular coherence
171
fieldtrip_cmc.get_version: Callable[[], str]
172
173
# mTRF Dataset
174
mtrf.data_path: Callable[..., str] # Multivariate temporal response functions
175
mtrf.get_version: Callable[[], str]
176
177
def load_speech_envelope(path: Optional[str] = None, verbose: Optional[Union[bool, str, int]] = None) -> Tuple[ArrayLike, float]:
178
"""
179
Load speech envelope stimulus.
180
181
Returns:
182
Tuple of (envelope_data, sampling_rate)
183
"""
184
```
185
186
### Phantom and Calibration Datasets
187
188
Datasets with known ground truth for method validation and calibration.
189
190
```python { .api }
191
# 4D BTi Phantom Dataset
192
phantom_4dbti.data_path: Callable[..., str] # 4D Neuroimaging phantom
193
phantom_4dbti.get_version: Callable[[], str]
194
195
# KIT Phantom Dataset
196
phantom_kit.data_path: Callable[..., str] # KIT/Yokogawa phantom data
197
phantom_kit.get_version: Callable[[], str]
198
199
# Kernel Phantom Dataset
200
phantom_kernel.data_path: Callable[..., str] # Kernel flow phantom
201
phantom_kernel.get_version: Callable[[], str]
202
203
def load_data(subject: str = 'phantom', session: str = '20220927_114934',
204
path: Optional[str] = None, verbose: Optional[Union[bool, str, int]] = None) -> Raw:
205
"""
206
Load phantom data directly as Raw object.
207
208
Parameters:
209
- subject: Subject identifier
210
- session: Session identifier
211
- path: Dataset path
212
- verbose: Verbosity level
213
214
Returns:
215
Raw object with phantom data
216
"""
217
```
218
219
### Standard Brain Templates and Atlases
220
221
Access to standard brain templates and parcellations.
222
223
```python { .api }
224
def fetch_fsaverage(subjects_dir: Optional[str] = None, verbose: Optional[Union[bool, str, int]] = None) -> str:
225
"""
226
Fetch FreeSurfer average brain template.
227
228
Parameters:
229
- subjects_dir: FreeSurfer subjects directory
230
- verbose: Verbosity level
231
232
Returns:
233
Path to fsaverage directory
234
"""
235
236
def fetch_infant_template(age: str, subjects_dir: Optional[str] = None,
237
verbose: Optional[Union[bool, str, int]] = None) -> str:
238
"""
239
Fetch infant brain template.
240
241
Parameters:
242
- age: Age group ('6mo', '12mo', etc.)
243
- subjects_dir: FreeSurfer subjects directory
244
- verbose: Verbosity level
245
246
Returns:
247
Path to infant template
248
"""
249
250
def fetch_hcp_mmp_parcellation(subjects_dir: Optional[str] = None, verbose: Optional[Union[bool, str, int]] = None) -> List[str]:
251
"""
252
Fetch HCP multi-modal parcellation.
253
254
Parameters:
255
- subjects_dir: FreeSurfer subjects directory
256
- verbose: Verbosity level
257
258
Returns:
259
List of paths to parcellation files
260
"""
261
262
def fetch_aparc_sub_parcellation(subjects_dir: Optional[str] = None, verbose: Optional[Union[bool, str, int]] = None) -> List[str]:
263
"""
264
Fetch aparc sub-parcellation.
265
266
Parameters:
267
- subjects_dir: FreeSurfer subjects directory
268
- verbose: Verbosity level
269
270
Returns:
271
List of paths to sub-parcellation files
272
"""
273
```
274
275
### Dataset Utilities
276
277
Utility functions for dataset management and discovery.
278
279
```python { .api }
280
def has_dataset(name: str, path: Optional[str] = None) -> bool:
281
"""
282
Check if dataset is available locally.
283
284
Parameters:
285
- name: Dataset name
286
- path: Custom path to check
287
288
Returns:
289
True if dataset is available
290
"""
291
292
def get_version(name: str) -> str:
293
"""
294
Get version of specific dataset.
295
296
Parameters:
297
- name: Dataset name
298
299
Returns:
300
Version string
301
"""
302
303
def _download_all_example_data(path: Optional[str] = None, verbose: Optional[Union[bool, str, int]] = None) -> None:
304
"""
305
Download all example datasets (for CI/testing).
306
307
Parameters:
308
- path: Download path
309
- verbose: Verbosity level
310
"""
311
```
312
313
## Usage Examples
314
315
### Loading Sample Dataset
316
317
```python
318
import mne
319
320
# Download sample dataset (if not already present)
321
sample_data_folder = mne.datasets.sample.data_path()
322
print(f"Sample data location: {sample_data_folder}")
323
324
# Load sample data files
325
sample_data_raw_file = sample_data_folder / 'MEG' / 'sample' / 'sample_audvis_filt-0-40_raw.fif'
326
sample_data_cov_file = sample_data_folder / 'MEG' / 'sample' / 'sample_audvis-cov.fif'
327
sample_data_trans_file = sample_data_folder / 'MEG' / 'sample' / 'sample_audvis_raw-trans.fif'
328
329
# Load the actual data
330
raw = mne.io.read_raw_fif(sample_data_raw_file, preload=True)
331
cov = mne.read_cov(sample_data_cov_file)
332
333
print(f"Raw data: {raw}")
334
print(f"Covariance: {cov}")
335
```
336
337
### Motor Imagery Classification Data
338
339
```python
340
import mne
341
from mne.datasets import eegbci
342
343
# Load EEGBCI motor imagery data
344
eegbci_path = eegbci.data_path()
345
print(f"EEGBCI data location: {eegbci_path}")
346
347
# Load specific subject and runs
348
subject = 1
349
runs = [6, 10, 14] # Motor imagery runs
350
raw_fnames = eegbci.load_data(subject, runs)
351
352
# Load and concatenate runs
353
raws = [mne.io.read_raw_edf(f, preload=True) for f in raw_fnames]
354
raw = mne.concatenate_raws(raws)
355
356
# Set channel names to standard 10-20 system
357
mne.datasets.eegbci.standardize(raw)
358
359
# Set montage
360
montage = mne.channels.make_standard_montage('standard_1005')
361
raw.set_montage(montage)
362
363
print(f"Motor imagery data: {raw}")
364
```
365
366
### Using Phantom Data for Validation
367
368
```python
369
import mne
370
from mne.datasets import phantom_kit
371
372
# Load phantom dataset
373
phantom_path = phantom_kit.data_path()
374
print(f"Phantom data location: {phantom_path}")
375
376
# Phantom data has known dipole locations - useful for validation
377
phantom_raw_file = phantom_path / 'phantom_100hz_20_sec_raw.fif'
378
phantom_raw = mne.io.read_raw_fif(phantom_raw_file, preload=True)
379
380
# Load dipole information
381
phantom_dipoles_file = phantom_path / 'phantom_dipoles.txt'
382
# dipoles = load_phantom_dipoles(phantom_dipoles_file) # Custom function
383
384
print(f"Phantom raw data: {phantom_raw}")
385
```
386
387
### Sleep Dataset Analysis
388
389
```python
390
import mne
391
from mne.datasets import sleep_physionet
392
393
# Load sleep dataset
394
sleep_path = sleep_physionet.data_path()
395
print(f"Sleep data location: {sleep_path}")
396
397
# Load specific subject data
398
subjects = sleep_physionet.age_group_averages()
399
print(f"Available subjects: {len(subjects)}")
400
401
# Example loading one subject's data
402
# sleep_raw = mne.io.read_raw_edf(subjects[0], preload=True)
403
# print(f"Sleep recording: {sleep_raw}")
404
```
405
406
### Visual Categorization Dataset
407
408
```python
409
import mne
410
from mne.datasets import visual_92_categories
411
412
# Load visual categorization data
413
visual_path = visual_92_categories.data_path()
414
print(f"Visual data location: {visual_path}")
415
416
# Load preprocessed data
417
data, labels = visual_92_categories.load_data()
418
print(f"Data shape: {data.shape}")
419
print(f"Labels shape: {labels.shape}")
420
print(f"Unique categories: {len(np.unique(labels))}")
421
```
422
423
### FreeSurfer Template
424
425
```python
426
import mne
427
428
# Fetch FreeSurfer average brain
429
subjects_dir = mne.datasets.fetch_fsaverage(verbose=True)
430
print(f"fsaverage template: {subjects_dir}")
431
432
# Fetch HCP multi-modal parcellation
433
hcp_parcellation = mne.datasets.fetch_hcp_mmp_parcellation(subjects_dir=subjects_dir)
434
print(f"HCP parcellation files: {len(hcp_parcellation)}")
435
436
# Check if dataset is available
437
has_sample = mne.datasets.has_dataset('sample')
438
print(f"Sample dataset available: {has_sample}")
439
```
440
441
### Checking Dataset Availability
442
443
```python
444
import mne
445
446
# List of available datasets
447
datasets = [
448
'sample', 'somato', 'spm_face', 'eegbci', 'hf_sef',
449
'multimodal', 'opm', 'phantom_4dbti', 'visual_92_categories'
450
]
451
452
for dataset in datasets:
453
available = mne.datasets.has_dataset(dataset)
454
if hasattr(mne.datasets, dataset):
455
version = getattr(mne.datasets, dataset).get_version()
456
print(f"{dataset}: {'✓' if available else '✗'} (v{version})")
457
else:
458
print(f"{dataset}: {'✓' if available else '✗'}")
459
```
460
461
## Dataset Categories
462
463
### By Recording Modality
464
465
- **MEG**: sample, somato, multimodal, hf_sef, opm
466
- **EEG**: eegbci, spm_face, visual_92_categories, kiloword
467
- **ECoG**: epilepsy_ecog
468
- **fNIRS**: fnirs_motor
469
- **Sleep**: sleep_physionet
470
471
### By Experimental Paradigm
472
473
- **Sensory**: sample (auditory/visual), somato (somatosensory), hf_sef (tactile)
474
- **Motor**: eegbci (motor imagery), somato (motor responses)
475
- **Cognitive**: spm_face (face processing), visual_92_categories (object recognition)
476
- **Language**: kiloword (lexical decision)
477
- **Clinical**: epilepsy_ecog (seizure data), sleep_physionet (sleep disorders)
478
479
### By Use Case
480
481
- **Tutorials**: sample, somato, spm_face
482
- **Method validation**: phantom_4dbti, phantom_kit, phantom_kernel
483
- **BCI research**: eegbci, ssvep
484
- **Connectivity**: fieldtrip_cmc, mtrf
485
- **Templates**: fsaverage, infant templates, HCP parcellation
486
487
## Types
488
489
```python { .api }
490
from typing import Union, Optional, List, Dict, Tuple, Callable, Any
491
import numpy as np
492
493
ArrayLike = Union[np.ndarray, List, Tuple]
494
```