or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

analysis-tools.mdauxiliary-data.mdconverters.mdcoordinate-transformations.mdcore-functionality.mdindex.mdio-formats.mdselection-language.mdtopology-handling.mdunits-utilities.md

io-formats.mddocs/

0

# File I/O and Format Support

1

2

MDAnalysis provides comprehensive support for reading and writing molecular structure and trajectory data across many file formats commonly used in molecular dynamics simulations.

3

4

## Overview

5

6

The I/O system in MDAnalysis is built around three main components:

7

8

- **Readers**: Read trajectory data with support for sequential and random access

9

- **Writers**: Write coordinate data to various output formats

10

- **Parsers**: Extract topology information from structure files

11

12

All I/O operations use a unified interface with automatic format detection based on file extensions.

13

14

## Core I/O Functions

15

16

### Reader Function

17

18

```python { .api }

19

def reader(filename, format=None, **kwargs):

20

"""

21

Get a trajectory reader for the specified file.

22

23

Parameters

24

----------

25

filename : str or file-like

26

Path to trajectory file or file-like object.

27

format : str, optional

28

File format override. If None, format is guessed from file extension.

29

**kwargs

30

Additional arguments passed to format-specific reader.

31

32

Returns

33

-------

34

ReaderBase

35

Trajectory reader object appropriate for the file format.

36

37

Examples

38

--------

39

>>> from MDAnalysis.coordinates import reader

40

>>> traj = reader("trajectory.xtc")

41

>>> for ts in traj:

42

... print(f"Frame {ts.frame}, Time: {ts.time}")

43

"""

44

```

45

46

### Writer Function

47

48

```python { .api }

49

def Writer(filename, n_atoms=None, format=None, multiframe=None, **kwargs):

50

"""

51

Create a trajectory writer for the specified file format.

52

53

Parameters

54

----------

55

filename : str or file-like

56

Output filename or file-like object.

57

n_atoms : int, optional

58

Number of atoms in the system (required for some formats).

59

format : str, optional

60

Output format. If None, guessed from filename extension.

61

multiframe : bool, optional

62

Whether writer supports multiple frames. If None, determined automatically.

63

bonds : str, optional

64

How to handle bond information ('all', 'none', 'conect').

65

**kwargs

66

Additional format-specific arguments.

67

68

Returns

69

-------

70

WriterBase

71

Writer object for the specified format.

72

73

Examples

74

--------

75

>>> W = Writer("output.xtc", n_atoms=1000)

76

>>> for ts in u.trajectory:

77

... W.write(u.atoms)

78

>>> W.close()

79

80

>>> # Context manager usage

81

>>> with Writer("output.dcd", n_atoms=u.atoms.n_atoms) as W:

82

... for ts in u.trajectory:

83

... W.write(u.atoms)

84

"""

85

```

86

87

## Supported File Formats

88

89

### Structure Formats (Topology)

90

91

MDAnalysis supports reading topology information from these formats:

92

93

#### CHARMM Formats

94

95

```python { .api }

96

# PSF (CHARMM/NAMD Topology)

97

u = mda.Universe("system.psf", "trajectory.dcd")

98

99

# CRD (CHARMM Coordinate)

100

u = mda.Universe("coordinates.crd")

101

```

102

103

**Capabilities:**

104

- **PSF**: Complete topology with bonds, angles, dihedrals, impropers

105

- **CRD**: Coordinate data, limited topology information

106

107

#### GROMACS Formats

108

109

```python { .api }

110

# TPR (GROMACS Binary Topology)

111

u = mda.Universe("topol.tpr", "trajectory.xtc")

112

113

# GRO (GROMACS Structure)

114

u = mda.Universe("system.gro")

115

116

# TOP/ITP (GROMACS Text Topology) - limited support

117

u = mda.Universe("topol.top")

118

```

119

120

**Capabilities:**

121

- **TPR**: Complete binary topology with all parameters

122

- **GRO**: Coordinates, atom names, residue information

123

- **TOP**: Basic connectivity (bonds only)

124

125

#### AMBER Formats

126

127

```python { .api }

128

# PRMTOP (AMBER Topology)

129

u = mda.Universe("system.prmtop", "trajectory.nc")

130

131

# INPCRD (AMBER Coordinate)

132

u = mda.Universe("coordinates.inpcrd")

133

```

134

135

**Capabilities:**

136

- **PRMTOP**: Complete topology with force field parameters

137

- **INPCRD**: Coordinates, box information

138

139

#### Standard Formats

140

141

```python { .api }

142

# PDB (Protein Data Bank)

143

u = mda.Universe("structure.pdb")

144

145

# PQR (PDB with Charges and Radii)

146

u = mda.Universe("system.pqr")

147

148

# MOL2 (Tripos Molecular Structure)

149

u = mda.Universe("molecule.mol2")

150

151

# PDBQT (AutoDock format)

152

u = mda.Universe("protein.pdbqt")

153

```

154

155

### Trajectory Formats

156

157

#### Binary Trajectory Formats

158

159

```python { .api }

160

# DCD (CHARMM/NAMD/LAMMPS)

161

u = mda.Universe("topology.psf", "trajectory.dcd")

162

163

# XTC (GROMACS Compressed)

164

u = mda.Universe("topol.tpr", "trajectory.xtc")

165

166

# TRR (GROMACS Full Precision)

167

u = mda.Universe("topol.tpr", "trajectory.trr")

168

169

# TNG (Trajectory Next Generation)

170

u = mda.Universe("topol.tpr", "trajectory.tng")

171

172

# NetCDF (AMBER NetCDF)

173

u = mda.Universe("system.prmtop", "trajectory.nc")

174

```

175

176

#### Text Trajectory Formats

177

178

```python { .api }

179

# XYZ (Generic Coordinate)

180

u = mda.Universe("trajectory.xyz")

181

182

# LAMMPS Trajectory

183

u = mda.Universe("data.lammps", "dump.lammpstrj")

184

185

# AMBER ASCII Trajectory

186

u = mda.Universe("system.prmtop", "mdcrd")

187

```

188

189

## Reader Base Classes

190

191

### ReaderBase

192

193

```python { .api }

194

class ReaderBase:

195

"""

196

Base class for trajectory readers supporting multiple frames.

197

"""

198

199

@property

200

def n_frames(self):

201

"""

202

Total number of frames in trajectory.

203

204

Returns

205

-------

206

int

207

Number of trajectory frames.

208

"""

209

210

@property

211

def dt(self):

212

"""

213

Time step between frames.

214

215

Returns

216

-------

217

float

218

Time step in picoseconds.

219

"""

220

221

@property

222

def totaltime(self):

223

"""

224

Total simulation time span.

225

226

Returns

227

-------

228

float

229

Total time covered by trajectory in picoseconds.

230

"""

231

232

def __iter__(self):

233

"""

234

Iterate through all frames in trajectory.

235

236

Yields

237

------

238

Timestep

239

Timestep object for each frame.

240

241

Examples

242

--------

243

>>> for ts in u.trajectory:

244

... print(f"Time: {ts.time}, Frame: {ts.frame}")

245

"""

246

247

def __getitem__(self, frame):

248

"""

249

Access specific frame(s) by index.

250

251

Parameters

252

----------

253

frame : int or slice

254

Frame index or slice object.

255

256

Returns

257

-------

258

Timestep

259

Timestep object for requested frame(s).

260

261

Examples

262

--------

263

>>> ts = u.trajectory[0] # First frame

264

>>> ts = u.trajectory[-1] # Last frame

265

>>> u.trajectory[10:20:2] # Slice with step

266

"""

267

268

def next(self):

269

"""

270

Advance to next frame.

271

272

Returns

273

-------

274

Timestep

275

Timestep object for next frame.

276

"""

277

278

def rewind(self):

279

"""

280

Return to first frame of trajectory.

281

282

Examples

283

--------

284

>>> u.trajectory.rewind()

285

>>> assert u.trajectory.frame == 0

286

"""

287

288

def close(self):

289

"""

290

Close trajectory file and free resources.

291

"""

292

```

293

294

### SingleFrameReaderBase

295

296

```python { .api }

297

class SingleFrameReaderBase:

298

"""

299

Base class for single-frame coordinate readers (e.g., PDB, GRO).

300

"""

301

302

@property

303

def n_frames(self):

304

"""

305

Always returns 1 for single-frame readers.

306

307

Returns

308

-------

309

int

310

Always 1.

311

"""

312

```

313

314

## Writer Base Classes

315

316

### WriterBase

317

318

```python { .api }

319

class WriterBase:

320

"""

321

Base class for coordinate writers.

322

"""

323

324

def __init__(self, filename, n_atoms, **kwargs):

325

"""

326

Initialize coordinate writer.

327

328

Parameters

329

----------

330

filename : str

331

Output filename.

332

n_atoms : int

333

Number of atoms to write.

334

**kwargs

335

Format-specific arguments.

336

"""

337

338

def write(self, selection, ts=None):

339

"""

340

Write coordinates for selected atoms.

341

342

Parameters

343

----------

344

selection : AtomGroup

345

Atoms to write to file.

346

ts : Timestep, optional

347

Timestep object with coordinate data. If None, uses

348

current coordinates from selection.

349

350

Examples

351

--------

352

>>> with Writer("output.pdb", n_atoms=protein.n_atoms) as W:

353

... for ts in u.trajectory:

354

... W.write(protein)

355

"""

356

357

def close(self):

358

"""

359

Close output file and finalize writing.

360

"""

361

362

def __enter__(self):

363

"""

364

Context manager entry.

365

366

Returns

367

-------

368

WriterBase

369

Self for context manager usage.

370

"""

371

372

def __exit__(self, exc_type, exc_val, exc_tb):

373

"""

374

Context manager exit, automatically closes file.

375

"""

376

```

377

378

## Timestep Class

379

380

```python { .api }

381

class Timestep:

382

"""

383

Container for coordinate data from a single trajectory frame.

384

"""

385

386

def __init__(self, n_atoms, **kwargs):

387

"""

388

Create timestep for specified number of atoms.

389

390

Parameters

391

----------

392

n_atoms : int

393

Number of atoms in the system.

394

positions : bool, optional

395

Whether to allocate position array (default True).

396

velocities : bool, optional

397

Whether to allocate velocity array (default False).

398

forces : bool, optional

399

Whether to allocate force array (default False).

400

"""

401

402

@property

403

def positions(self):

404

"""

405

Atomic coordinates for current frame.

406

407

Returns

408

-------

409

numpy.ndarray

410

Array of shape (n_atoms, 3) with atomic coordinates.

411

"""

412

413

@property

414

def velocities(self):

415

"""

416

Atomic velocities for current frame.

417

418

Returns

419

-------

420

numpy.ndarray or None

421

Array of shape (n_atoms, 3) with velocities if available.

422

"""

423

424

@property

425

def forces(self):

426

"""

427

Atomic forces for current frame.

428

429

Returns

430

-------

431

numpy.ndarray or None

432

Array of shape (n_atoms, 3) with forces if available.

433

"""

434

435

@property

436

def dimensions(self):

437

"""

438

Unit cell dimensions.

439

440

Returns

441

-------

442

numpy.ndarray or None

443

Array [a, b, c, alpha, beta, gamma] with box parameters.

444

"""

445

446

@property

447

def volume(self):

448

"""

449

Unit cell volume.

450

451

Returns

452

-------

453

float or None

454

Volume in cubic Angstroms, None if no box information.

455

"""

456

457

@property

458

def time(self):

459

"""

460

Simulation time for this frame.

461

462

Returns

463

-------

464

float

465

Time in picoseconds.

466

"""

467

468

@property

469

def frame(self):

470

"""

471

Frame number in trajectory.

472

473

Returns

474

-------

475

int

476

Zero-based frame index.

477

"""

478

479

def copy(self):

480

"""

481

Create independent copy of timestep.

482

483

Returns

484

-------

485

Timestep

486

Deep copy of timestep with independent arrays.

487

"""

488

```

489

490

## Format-Specific Features

491

492

### GROMACS XTC/TRR

493

494

```python { .api }

495

# XTC compressed trajectories

496

u = mda.Universe("topol.tpr", "trajectory.xtc")

497

498

# Access precision information

499

print(f"XTC precision: {u.trajectory.precision}")

500

501

# TRR full precision with velocities/forces

502

u = mda.Universe("topol.tpr", "trajectory.trr")

503

if hasattr(u.trajectory.ts, 'velocities'):

504

velocities = u.trajectory.ts.velocities

505

```

506

507

### CHARMM/NAMD DCD

508

509

```python { .api }

510

u = mda.Universe("system.psf", "trajectory.dcd")

511

512

# DCD supports fixed atoms

513

if hasattr(u.trajectory, 'fixed'):

514

fixed_atoms = u.trajectory.fixed

515

516

# Periodic boundary information

517

dimensions = u.trajectory.ts.dimensions

518

```

519

520

### AMBER NetCDF

521

522

```python { .api }

523

u = mda.Universe("system.prmtop", "trajectory.nc")

524

525

# NetCDF trajectories support metadata

526

print(f"NetCDF conventions: {u.trajectory.Conventions}")

527

print(f"Application: {u.trajectory.application}")

528

```

529

530

## I/O Usage Patterns

531

532

### Reading Multiple Trajectories

533

534

```python { .api }

535

# Concatenate multiple trajectory files

536

u = mda.Universe("topology.psf", "part1.dcd", "part2.dcd", "part3.dcd")

537

538

# All files treated as continuous trajectory

539

print(f"Total frames: {u.trajectory.n_frames}")

540

541

# Or load sequentially

542

u = mda.Universe("topology.psf", "part1.dcd")

543

for additional in ["part2.dcd", "part3.dcd"]:

544

u.load_new(additional)

545

```

546

547

### Writing Trajectories

548

549

```python { .api }

550

# Write subset of atoms

551

protein = u.select_atoms("protein")

552

553

with mda.Writer("protein_only.xtc", n_atoms=protein.n_atoms) as W:

554

for ts in u.trajectory:

555

W.write(protein)

556

557

# Write specific frames

558

with mda.Writer("every_10th.dcd", n_atoms=u.atoms.n_atoms) as W:

559

for ts in u.trajectory[::10]: # Every 10th frame

560

W.write(u.atoms)

561

562

# Single frame output

563

u.atoms.write("final_frame.pdb") # Current frame

564

u.trajectory[-1] # Go to last frame

565

u.atoms.write("last_frame.gro")

566

```

567

568

### Memory-Efficient Processing

569

570

```python { .api }

571

# Process large trajectories in chunks

572

def process_in_chunks(universe, chunk_size=1000):

573

n_frames = universe.trajectory.n_frames

574

575

for start in range(0, n_frames, chunk_size):

576

end = min(start + chunk_size, n_frames)

577

578

# Load chunk into memory for fast access

579

universe.transfer_to_memory(start=start, stop=end)

580

581

# Process chunk

582

for ts in universe.trajectory[start:end]:

583

# Perform analysis

584

pass

585

```

586

587

### Format Conversion

588

589

```python { .api }

590

def convert_trajectory(input_files, output_file, selection="all"):

591

"""

592

Convert trajectory between formats.

593

594

Parameters

595

----------

596

input_files : tuple

597

(topology, trajectory) file paths.

598

output_file : str

599

Output trajectory file.

600

selection : str, optional

601

Atom selection to write (default "all").

602

"""

603

u = mda.Universe(*input_files)

604

atoms = u.select_atoms(selection)

605

606

with mda.Writer(output_file, n_atoms=atoms.n_atoms) as W:

607

for ts in u.trajectory:

608

W.write(atoms)

609

610

# Example: Convert AMBER to GROMACS

611

convert_trajectory(("system.prmtop", "trajectory.nc"), "output.xtc")

612

613

# Example: Extract protein only

614

convert_trajectory(("system.psf", "trajectory.dcd"), "protein.xtc", "protein")

615

```

616

617

### Handling File Streams

618

619

```python { .api }

620

import gzip

621

import bz2

622

623

# Compressed files (automatic detection)

624

with gzip.open("trajectory.xtc.gz", 'rb') as f:

625

u = mda.Universe("topology.tpr", f)

626

627

# Multiple compressed trajectories

628

u = mda.Universe("topology.tpr", "traj1.xtc.bz2", "traj2.xtc.gz")

629

630

# In-memory trajectories

631

from io import BytesIO

632

data = BytesIO(compressed_trajectory_data)

633

u = mda.Universe("topology.tpr", data, format="XTC")

634

```

635

636

## Error Handling

637

638

```python { .api }

639

from MDAnalysis.exceptions import NoDataError

640

641

try:

642

u = mda.Universe("topology.psf", "trajectory.dcd")

643

except FileNotFoundError:

644

print("Trajectory file not found")

645

except NoDataError as e:

646

print(f"Missing required data: {e}")

647

648

# Check for optional data

649

if u.trajectory.ts.has_velocities:

650

velocities = u.atoms.velocities

651

else:

652

print("No velocity data available")

653

654

# Validate trajectory compatibility

655

if u.atoms.n_atoms != u.trajectory.n_atoms:

656

raise ValueError("Atom count mismatch between topology and trajectory")

657

```

658

659

## Performance Considerations

660

661

### Memory Usage

662

663

```python { .api }

664

# Load trajectory into memory for repeated access

665

u.transfer_to_memory() # Load all frames

666

667

# Partial loading for large trajectories

668

u.transfer_to_memory(start=0, stop=1000, step=10) # Every 10th frame

669

670

# Memory-efficient single pass

671

for ts in u.trajectory: # Streaming access

672

# Process frame immediately

673

pass

674

```

675

676

### Random Access Performance

677

678

```python { .api }

679

# Efficient for formats with index support (XTC, TRR, NetCDF)

680

u.trajectory[1000] # Direct access to frame 1000

681

682

# Less efficient for sequential formats (DCD, ASCII)

683

# Consider loading into memory for random access

684

if u.trajectory.n_frames < 10000: # Small enough for memory

685

u.transfer_to_memory()

686

687

# Then random access is fast

688

u.trajectory[1000]

689

```