or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-pysam

Package for reading, manipulating, and writing genomic data

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/pysam@0.23.x

To install, run

npx @tessl/cli install tessl/pypi-pysam@0.23.0

0

# Pysam

1

2

A comprehensive Python wrapper for the HTSlib library that provides facilities for reading, manipulating, and writing genomic data sets in standard bioinformatics formats. Pysam supports SAM/BAM/CRAM for sequence alignments, VCF/BCF for variant calls, and FASTA/FASTQ for sequences, along with tabix-indexed files and compressed formats.

3

4

## Package Information

5

6

- **Package Name**: pysam

7

- **Language**: Python

8

- **Installation**: `pip install pysam`

9

10

## Core Imports

11

12

```python

13

import pysam

14

```

15

16

Common specific imports:

17

18

```python

19

from pysam import AlignmentFile, VariantFile, FastaFile, TabixFile, BGZFile

20

from pysam import samtools, bcftools # For command-line functions

21

```

22

23

## Basic Usage

24

25

```python

26

import pysam

27

28

# Reading SAM/BAM/CRAM alignment files

29

with pysam.AlignmentFile("example.bam", "rb") as samfile:

30

for read in samfile.fetch("chr1", 1000, 2000):

31

print(f"Read: {read.query_name}, Position: {read.reference_start}")

32

33

# Reading VCF/BCF variant files

34

with pysam.VariantFile("example.vcf") as vcffile:

35

for record in vcffile.fetch("chr1", 1000, 2000):

36

print(f"Variant at {record.pos}: {record.ref} -> {record.alts}")

37

38

# Reading FASTA files

39

with pysam.FastaFile("reference.fa") as fastafile:

40

sequence = fastafile.fetch("chr1", 1000, 2000)

41

print(f"Sequence: {sequence}")

42

43

# Command-line tool integration

44

pysam.sort("-o", "sorted.bam", "input.bam")

45

pysam.index("sorted.bam")

46

47

# BCFtools for variant processing

48

pysam.call("-mv", "-o", "calls.vcf", "pileup.bcf")

49

pysam.filter("-i", "QUAL>=20", "-o", "filtered.vcf", "calls.vcf")

50

```

51

52

## Architecture

53

54

Pysam follows a modular architecture built around HTSlib's C API:

55

56

- **File Classes**: High-level interfaces (`AlignmentFile`, `VariantFile`, `FastaFile`, `TabixFile`) that provide Pythonic access to genomic file formats

57

- **Record Classes**: Data structures (`AlignedSegment`, `VariantRecord`, `FastxRecord`) representing individual entries with attribute access

58

- **Proxy Classes**: Efficient access to parsed data without copying (`GTFProxy`, `VCFProxy`, `BedProxy`)

59

- **Iterator Classes**: Different iteration patterns (row-wise, column-wise, pileup) for accessing data

60

- **Command Integration**: Direct access to samtools and bcftools command-line functionality

61

62

This design enables efficient processing of large genomic datasets while maintaining Python's ease of use.

63

64

## Capabilities

65

66

### SAM/BAM/CRAM Alignment Files

67

68

Read and write sequence alignment files with support for indexing, random access, and comprehensive metadata handling.

69

70

```python { .api }

71

class AlignmentFile:

72

def __init__(self, filepath, mode, **kwargs): ...

73

def fetch(self, contig=None, start=None, stop=None): ...

74

def pileup(self, contig=None, start=None, stop=None): ...

75

76

class AlignedSegment:

77

query_name: str

78

reference_start: int

79

reference_end: int

80

query_sequence: str

81

query_qualities: list

82

```

83

84

[SAM/BAM/CRAM Files](./alignment-files.md)

85

86

### VCF/BCF Variant Files

87

88

Handle variant call format files with full header support, sample data access, and filtering capabilities.

89

90

```python { .api }

91

class VariantFile:

92

def __init__(self, filepath, mode="r", **kwargs): ...

93

def fetch(self, contig=None, start=None, stop=None): ...

94

95

class VariantRecord:

96

contig: str

97

pos: int

98

ref: str

99

alts: tuple

100

qual: float

101

```

102

103

[VCF/BCF Files](./variant-files.md)

104

105

### FASTA/FASTQ Sequence Files

106

107

Access sequence files with both random access (FASTA with index) and streaming capabilities (FASTA/FASTQ).

108

109

```python { .api }

110

class FastaFile:

111

def __init__(self, filename): ...

112

def fetch(self, reference, start=None, end=None): ...

113

114

class FastxFile:

115

def __init__(self, filename, mode="r"): ...

116

def __iter__(self): ...

117

118

class FastxRecord:

119

name: str

120

sequence: str

121

comment: str

122

quality: str

123

```

124

125

[FASTA/FASTQ Files](./sequence-files.md)

126

127

### Tabix-Indexed Files

128

129

Access compressed, indexed genomic files with support for multiple formats (BED, GFF, GTF, VCF).

130

131

```python { .api }

132

class TabixFile:

133

def __init__(self, filename, parser=None): ...

134

def fetch(self, reference, start=None, end=None, parser=None): ...

135

136

def tabix_index(filename, preset=None, **kwargs): ...

137

def tabix_compress(filename_in, filename_out, **kwargs): ...

138

```

139

140

[Tabix Files](./tabix-files.md)

141

142

### Compressed Files (BGZF)

143

144

Handle block gzip compressed files commonly used in genomics.

145

146

```python { .api }

147

class BGZFile:

148

def __init__(self, filepath, mode): ...

149

def read(self, size=-1): ...

150

def write(self, data): ...

151

def seek(self, offset, whence=0): ...

152

```

153

154

[BGZF Files](./bgzf-files.md)

155

156

### Command-Line Tools Integration

157

158

Access samtools and bcftools functionality directly from Python with all subcommands available as functions.

159

160

```python { .api }

161

def view(*args, **kwargs): ...

162

def sort(*args, **kwargs): ...

163

def index(*args, **kwargs): ...

164

def stats(*args, **kwargs): ...

165

def call(*args, **kwargs): ...

166

def merge(*args, **kwargs): ...

167

```

168

169

[Command-Line Tools](./command-tools.md)

170

171

### Utility Functions and Constants

172

173

Helper functions for quality score conversion, error handling, and genomic constants.

174

175

```python { .api }

176

def qualitystring_to_array(s): ...

177

def array_to_qualitystring(a): ...

178

179

class SamtoolsError(Exception): ...

180

181

# CIGAR operations

182

CMATCH: int

183

CINS: int

184

CDEL: int

185

# SAM flags

186

FPAIRED: int

187

FUNMAP: int

188

FREVERSE: int

189

```

190

191

[Utilities](./utilities.md)

192

193

## Error Handling

194

195

Pysam uses `SamtoolsError` for command-line tool errors and standard Python exceptions for file I/O and data access issues. Most file operations support context managers for proper resource cleanup.

196

197

## Performance Considerations

198

199

- Use indexed files (`fetch()` with coordinates) for random access

200

- Stream processing with iterators for large datasets

201

- Context managers ensure proper file handle cleanup

202

- Proxy classes provide memory-efficient access to parsed data