0
# File I/O and Format Support
1
2
MDAnalysis provides comprehensive support for reading and writing molecular structure and trajectory data across many file formats commonly used in molecular dynamics simulations.
3
4
## Overview
5
6
The I/O system in MDAnalysis is built around three main components:
7
8
- **Readers**: Read trajectory data with support for sequential and random access
9
- **Writers**: Write coordinate data to various output formats
10
- **Parsers**: Extract topology information from structure files
11
12
All I/O operations use a unified interface with automatic format detection based on file extensions.
13
14
## Core I/O Functions
15
16
### Reader Function
17
18
```python { .api }
19
def reader(filename, format=None, **kwargs):
20
"""
21
Get a trajectory reader for the specified file.
22
23
Parameters
24
----------
25
filename : str or file-like
26
Path to trajectory file or file-like object.
27
format : str, optional
28
File format override. If None, format is guessed from file extension.
29
**kwargs
30
Additional arguments passed to format-specific reader.
31
32
Returns
33
-------
34
ReaderBase
35
Trajectory reader object appropriate for the file format.
36
37
Examples
38
--------
39
>>> from MDAnalysis.coordinates import reader
40
>>> traj = reader("trajectory.xtc")
41
>>> for ts in traj:
42
... print(f"Frame {ts.frame}, Time: {ts.time}")
43
"""
44
```
45
46
### Writer Function
47
48
```python { .api }
49
def Writer(filename, n_atoms=None, format=None, multiframe=None, **kwargs):
50
"""
51
Create a trajectory writer for the specified file format.
52
53
Parameters
54
----------
55
filename : str or file-like
56
Output filename or file-like object.
57
n_atoms : int, optional
58
Number of atoms in the system (required for some formats).
59
format : str, optional
60
Output format. If None, guessed from filename extension.
61
multiframe : bool, optional
62
Whether writer supports multiple frames. If None, determined automatically.
63
bonds : str, optional
64
How to handle bond information ('all', 'none', 'conect').
65
**kwargs
66
Additional format-specific arguments.
67
68
Returns
69
-------
70
WriterBase
71
Writer object for the specified format.
72
73
Examples
74
--------
75
>>> W = Writer("output.xtc", n_atoms=1000)
76
>>> for ts in u.trajectory:
77
... W.write(u.atoms)
78
>>> W.close()
79
80
>>> # Context manager usage
81
>>> with Writer("output.dcd", n_atoms=u.atoms.n_atoms) as W:
82
... for ts in u.trajectory:
83
... W.write(u.atoms)
84
"""
85
```
86
87
## Supported File Formats
88
89
### Structure Formats (Topology)
90
91
MDAnalysis supports reading topology information from these formats:
92
93
#### CHARMM Formats
94
95
```python { .api }
96
# PSF (CHARMM/NAMD Topology)
97
u = mda.Universe("system.psf", "trajectory.dcd")
98
99
# CRD (CHARMM Coordinate)
100
u = mda.Universe("coordinates.crd")
101
```
102
103
**Capabilities:**
104
- **PSF**: Complete topology with bonds, angles, dihedrals, impropers
105
- **CRD**: Coordinate data, limited topology information
106
107
#### GROMACS Formats
108
109
```python { .api }
110
# TPR (GROMACS Binary Topology)
111
u = mda.Universe("topol.tpr", "trajectory.xtc")
112
113
# GRO (GROMACS Structure)
114
u = mda.Universe("system.gro")
115
116
# TOP/ITP (GROMACS Text Topology) - limited support
117
u = mda.Universe("topol.top")
118
```
119
120
**Capabilities:**
121
- **TPR**: Complete binary topology with all parameters
122
- **GRO**: Coordinates, atom names, residue information
123
- **TOP**: Basic connectivity (bonds only)
124
125
#### AMBER Formats
126
127
```python { .api }
128
# PRMTOP (AMBER Topology)
129
u = mda.Universe("system.prmtop", "trajectory.nc")
130
131
# INPCRD (AMBER Coordinate)
132
u = mda.Universe("coordinates.inpcrd")
133
```
134
135
**Capabilities:**
136
- **PRMTOP**: Complete topology with force field parameters
137
- **INPCRD**: Coordinates, box information
138
139
#### Standard Formats
140
141
```python { .api }
142
# PDB (Protein Data Bank)
143
u = mda.Universe("structure.pdb")
144
145
# PQR (PDB with Charges and Radii)
146
u = mda.Universe("system.pqr")
147
148
# MOL2 (Tripos Molecular Structure)
149
u = mda.Universe("molecule.mol2")
150
151
# PDBQT (AutoDock format)
152
u = mda.Universe("protein.pdbqt")
153
```
154
155
### Trajectory Formats
156
157
#### Binary Trajectory Formats
158
159
```python { .api }
160
# DCD (CHARMM/NAMD/LAMMPS)
161
u = mda.Universe("topology.psf", "trajectory.dcd")
162
163
# XTC (GROMACS Compressed)
164
u = mda.Universe("topol.tpr", "trajectory.xtc")
165
166
# TRR (GROMACS Full Precision)
167
u = mda.Universe("topol.tpr", "trajectory.trr")
168
169
# TNG (Trajectory Next Generation)
170
u = mda.Universe("topol.tpr", "trajectory.tng")
171
172
# NetCDF (AMBER NetCDF)
173
u = mda.Universe("system.prmtop", "trajectory.nc")
174
```
175
176
#### Text Trajectory Formats
177
178
```python { .api }
179
# XYZ (Generic Coordinate)
180
u = mda.Universe("trajectory.xyz")
181
182
# LAMMPS Trajectory
183
u = mda.Universe("data.lammps", "dump.lammpstrj")
184
185
# AMBER ASCII Trajectory
186
u = mda.Universe("system.prmtop", "mdcrd")
187
```
188
189
## Reader Base Classes
190
191
### ReaderBase
192
193
```python { .api }
194
class ReaderBase:
195
"""
196
Base class for trajectory readers supporting multiple frames.
197
"""
198
199
@property
200
def n_frames(self):
201
"""
202
Total number of frames in trajectory.
203
204
Returns
205
-------
206
int
207
Number of trajectory frames.
208
"""
209
210
@property
211
def dt(self):
212
"""
213
Time step between frames.
214
215
Returns
216
-------
217
float
218
Time step in picoseconds.
219
"""
220
221
@property
222
def totaltime(self):
223
"""
224
Total simulation time span.
225
226
Returns
227
-------
228
float
229
Total time covered by trajectory in picoseconds.
230
"""
231
232
def __iter__(self):
233
"""
234
Iterate through all frames in trajectory.
235
236
Yields
237
------
238
Timestep
239
Timestep object for each frame.
240
241
Examples
242
--------
243
>>> for ts in u.trajectory:
244
... print(f"Time: {ts.time}, Frame: {ts.frame}")
245
"""
246
247
def __getitem__(self, frame):
248
"""
249
Access specific frame(s) by index.
250
251
Parameters
252
----------
253
frame : int or slice
254
Frame index or slice object.
255
256
Returns
257
-------
258
Timestep
259
Timestep object for requested frame(s).
260
261
Examples
262
--------
263
>>> ts = u.trajectory[0] # First frame
264
>>> ts = u.trajectory[-1] # Last frame
265
>>> u.trajectory[10:20:2] # Slice with step
266
"""
267
268
def next(self):
269
"""
270
Advance to next frame.
271
272
Returns
273
-------
274
Timestep
275
Timestep object for next frame.
276
"""
277
278
def rewind(self):
279
"""
280
Return to first frame of trajectory.
281
282
Examples
283
--------
284
>>> u.trajectory.rewind()
285
>>> assert u.trajectory.frame == 0
286
"""
287
288
def close(self):
289
"""
290
Close trajectory file and free resources.
291
"""
292
```
293
294
### SingleFrameReaderBase
295
296
```python { .api }
297
class SingleFrameReaderBase:
298
"""
299
Base class for single-frame coordinate readers (e.g., PDB, GRO).
300
"""
301
302
@property
303
def n_frames(self):
304
"""
305
Always returns 1 for single-frame readers.
306
307
Returns
308
-------
309
int
310
Always 1.
311
"""
312
```
313
314
## Writer Base Classes
315
316
### WriterBase
317
318
```python { .api }
319
class WriterBase:
320
"""
321
Base class for coordinate writers.
322
"""
323
324
def __init__(self, filename, n_atoms, **kwargs):
325
"""
326
Initialize coordinate writer.
327
328
Parameters
329
----------
330
filename : str
331
Output filename.
332
n_atoms : int
333
Number of atoms to write.
334
**kwargs
335
Format-specific arguments.
336
"""
337
338
def write(self, selection, ts=None):
339
"""
340
Write coordinates for selected atoms.
341
342
Parameters
343
----------
344
selection : AtomGroup
345
Atoms to write to file.
346
ts : Timestep, optional
347
Timestep object with coordinate data. If None, uses
348
current coordinates from selection.
349
350
Examples
351
--------
352
>>> with Writer("output.pdb", n_atoms=protein.n_atoms) as W:
353
... for ts in u.trajectory:
354
... W.write(protein)
355
"""
356
357
def close(self):
358
"""
359
Close output file and finalize writing.
360
"""
361
362
def __enter__(self):
363
"""
364
Context manager entry.
365
366
Returns
367
-------
368
WriterBase
369
Self for context manager usage.
370
"""
371
372
def __exit__(self, exc_type, exc_val, exc_tb):
373
"""
374
Context manager exit, automatically closes file.
375
"""
376
```
377
378
## Timestep Class
379
380
```python { .api }
381
class Timestep:
382
"""
383
Container for coordinate data from a single trajectory frame.
384
"""
385
386
def __init__(self, n_atoms, **kwargs):
387
"""
388
Create timestep for specified number of atoms.
389
390
Parameters
391
----------
392
n_atoms : int
393
Number of atoms in the system.
394
positions : bool, optional
395
Whether to allocate position array (default True).
396
velocities : bool, optional
397
Whether to allocate velocity array (default False).
398
forces : bool, optional
399
Whether to allocate force array (default False).
400
"""
401
402
@property
403
def positions(self):
404
"""
405
Atomic coordinates for current frame.
406
407
Returns
408
-------
409
numpy.ndarray
410
Array of shape (n_atoms, 3) with atomic coordinates.
411
"""
412
413
@property
414
def velocities(self):
415
"""
416
Atomic velocities for current frame.
417
418
Returns
419
-------
420
numpy.ndarray or None
421
Array of shape (n_atoms, 3) with velocities if available.
422
"""
423
424
@property
425
def forces(self):
426
"""
427
Atomic forces for current frame.
428
429
Returns
430
-------
431
numpy.ndarray or None
432
Array of shape (n_atoms, 3) with forces if available.
433
"""
434
435
@property
436
def dimensions(self):
437
"""
438
Unit cell dimensions.
439
440
Returns
441
-------
442
numpy.ndarray or None
443
Array [a, b, c, alpha, beta, gamma] with box parameters.
444
"""
445
446
@property
447
def volume(self):
448
"""
449
Unit cell volume.
450
451
Returns
452
-------
453
float or None
454
Volume in cubic Angstroms, None if no box information.
455
"""
456
457
@property
458
def time(self):
459
"""
460
Simulation time for this frame.
461
462
Returns
463
-------
464
float
465
Time in picoseconds.
466
"""
467
468
@property
469
def frame(self):
470
"""
471
Frame number in trajectory.
472
473
Returns
474
-------
475
int
476
Zero-based frame index.
477
"""
478
479
def copy(self):
480
"""
481
Create independent copy of timestep.
482
483
Returns
484
-------
485
Timestep
486
Deep copy of timestep with independent arrays.
487
"""
488
```
489
490
## Format-Specific Features
491
492
### GROMACS XTC/TRR
493
494
```python { .api }
495
# XTC compressed trajectories
496
u = mda.Universe("topol.tpr", "trajectory.xtc")
497
498
# Access precision information
499
print(f"XTC precision: {u.trajectory.precision}")
500
501
# TRR full precision with velocities/forces
502
u = mda.Universe("topol.tpr", "trajectory.trr")
503
if hasattr(u.trajectory.ts, 'velocities'):
504
velocities = u.trajectory.ts.velocities
505
```
506
507
### CHARMM/NAMD DCD
508
509
```python { .api }
510
u = mda.Universe("system.psf", "trajectory.dcd")
511
512
# DCD supports fixed atoms
513
if hasattr(u.trajectory, 'fixed'):
514
fixed_atoms = u.trajectory.fixed
515
516
# Periodic boundary information
517
dimensions = u.trajectory.ts.dimensions
518
```
519
520
### AMBER NetCDF
521
522
```python { .api }
523
u = mda.Universe("system.prmtop", "trajectory.nc")
524
525
# NetCDF trajectories support metadata
526
print(f"NetCDF conventions: {u.trajectory.Conventions}")
527
print(f"Application: {u.trajectory.application}")
528
```
529
530
## I/O Usage Patterns
531
532
### Reading Multiple Trajectories
533
534
```python { .api }
535
# Concatenate multiple trajectory files
536
u = mda.Universe("topology.psf", "part1.dcd", "part2.dcd", "part3.dcd")
537
538
# All files treated as continuous trajectory
539
print(f"Total frames: {u.trajectory.n_frames}")
540
541
# Or load sequentially
542
u = mda.Universe("topology.psf", "part1.dcd")
543
for additional in ["part2.dcd", "part3.dcd"]:
544
u.load_new(additional)
545
```
546
547
### Writing Trajectories
548
549
```python { .api }
550
# Write subset of atoms
551
protein = u.select_atoms("protein")
552
553
with mda.Writer("protein_only.xtc", n_atoms=protein.n_atoms) as W:
554
for ts in u.trajectory:
555
W.write(protein)
556
557
# Write specific frames
558
with mda.Writer("every_10th.dcd", n_atoms=u.atoms.n_atoms) as W:
559
for ts in u.trajectory[::10]: # Every 10th frame
560
W.write(u.atoms)
561
562
# Single frame output
563
u.atoms.write("final_frame.pdb") # Current frame
564
u.trajectory[-1] # Go to last frame
565
u.atoms.write("last_frame.gro")
566
```
567
568
### Memory-Efficient Processing
569
570
```python { .api }
571
# Process large trajectories in chunks
572
def process_in_chunks(universe, chunk_size=1000):
573
n_frames = universe.trajectory.n_frames
574
575
for start in range(0, n_frames, chunk_size):
576
end = min(start + chunk_size, n_frames)
577
578
# Load chunk into memory for fast access
579
universe.transfer_to_memory(start=start, stop=end)
580
581
# Process chunk
582
for ts in universe.trajectory[start:end]:
583
# Perform analysis
584
pass
585
```
586
587
### Format Conversion
588
589
```python { .api }
590
def convert_trajectory(input_files, output_file, selection="all"):
591
"""
592
Convert trajectory between formats.
593
594
Parameters
595
----------
596
input_files : tuple
597
(topology, trajectory) file paths.
598
output_file : str
599
Output trajectory file.
600
selection : str, optional
601
Atom selection to write (default "all").
602
"""
603
u = mda.Universe(*input_files)
604
atoms = u.select_atoms(selection)
605
606
with mda.Writer(output_file, n_atoms=atoms.n_atoms) as W:
607
for ts in u.trajectory:
608
W.write(atoms)
609
610
# Example: Convert AMBER to GROMACS
611
convert_trajectory(("system.prmtop", "trajectory.nc"), "output.xtc")
612
613
# Example: Extract protein only
614
convert_trajectory(("system.psf", "trajectory.dcd"), "protein.xtc", "protein")
615
```
616
617
### Handling File Streams
618
619
```python { .api }
620
import gzip
621
import bz2
622
623
# Compressed files (automatic detection)
624
with gzip.open("trajectory.xtc.gz", 'rb') as f:
625
u = mda.Universe("topology.tpr", f)
626
627
# Multiple compressed trajectories
628
u = mda.Universe("topology.tpr", "traj1.xtc.bz2", "traj2.xtc.gz")
629
630
# In-memory trajectories
631
from io import BytesIO
632
data = BytesIO(compressed_trajectory_data)
633
u = mda.Universe("topology.tpr", data, format="XTC")
634
```
635
636
## Error Handling
637
638
```python { .api }
639
from MDAnalysis.exceptions import NoDataError
640
641
try:
642
u = mda.Universe("topology.psf", "trajectory.dcd")
643
except FileNotFoundError:
644
print("Trajectory file not found")
645
except NoDataError as e:
646
print(f"Missing required data: {e}")
647
648
# Check for optional data
649
if u.trajectory.ts.has_velocities:
650
velocities = u.atoms.velocities
651
else:
652
print("No velocity data available")
653
654
# Validate trajectory compatibility
655
if u.atoms.n_atoms != u.trajectory.n_atoms:
656
raise ValueError("Atom count mismatch between topology and trajectory")
657
```
658
659
## Performance Considerations
660
661
### Memory Usage
662
663
```python { .api }
664
# Load trajectory into memory for repeated access
665
u.transfer_to_memory() # Load all frames
666
667
# Partial loading for large trajectories
668
u.transfer_to_memory(start=0, stop=1000, step=10) # Every 10th frame
669
670
# Memory-efficient single pass
671
for ts in u.trajectory: # Streaming access
672
# Process frame immediately
673
pass
674
```
675
676
### Random Access Performance
677
678
```python { .api }
679
# Efficient for formats with index support (XTC, TRR, NetCDF)
680
u.trajectory[1000] # Direct access to frame 1000
681
682
# Less efficient for sequential formats (DCD, ASCII)
683
# Consider loading into memory for random access
684
if u.trajectory.n_frames < 10000: # Small enough for memory
685
u.transfer_to_memory()
686
687
# Then random access is fast
688
u.trajectory[1000]
689
```