or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-pyvcf

A VCFv4.0 and 4.1 parser for Python

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/pyvcf@0.6.x

To install, run

npx @tessl/cli install tessl/pypi-pyvcf@0.6.0

0

# PyVCF

1

2

A comprehensive Python library for parsing and manipulating Variant Call Format (VCF) files v4.0 and 4.1. PyVCF provides a CSV-like interface for reading genomic variant data with automatic type conversion, comprehensive record access, and extensive filtering capabilities for bioinformatics applications.

3

4

## Package Information

5

6

- **Package Name**: PyVCF

7

- **Language**: Python

8

- **Installation**: `pip install pyvcf`

9

10

## Core Imports

11

12

```python

13

import vcf

14

```

15

16

Common imports for VCF parsing:

17

18

```python

19

from vcf import Reader, Writer

20

```

21

22

Alternative imports:

23

24

```python

25

from vcf import VCFReader, VCFWriter # Backwards compatibility aliases

26

```

27

28

Additional imports for filtering and utilities:

29

30

```python

31

from vcf import Filter # Base filter class (actually vcf.filters.Base)

32

from vcf.filters import SiteQuality, DepthPerSample, SnpOnly

33

from vcf.sample_filter import SampleFilter

34

from vcf.utils import walk_together, trim_common_suffix

35

from vcf import RESERVED_INFO, RESERVED_FORMAT # Constants

36

```

37

38

## Basic Usage

39

40

```python

41

import vcf

42

43

# Read a VCF file

44

reader = vcf.Reader(filename='variants.vcf')

45

46

# Iterate through records

47

for record in reader:

48

print(f"Chr: {record.CHROM}, Pos: {record.POS}")

49

print(f"Ref: {record.REF}, Alt: {record.ALT}")

50

51

# Access sample genotypes

52

for sample_call in record.samples:

53

print(f"Sample {sample_call.sample}: {sample_call.gt_bases}")

54

55

# Write a VCF file

56

input_reader = vcf.Reader(filename='input.vcf')

57

writer = vcf.Writer(open('output.vcf', 'w'), input_reader)

58

59

for record in input_reader:

60

if record.QUAL and record.QUAL > 30: # Filter by quality

61

writer.write_record(record)

62

63

writer.close()

64

```

65

66

## Architecture

67

68

PyVCF uses a structured approach to VCF parsing:

69

70

- **Reader**: Streaming VCF parser that returns Record objects with lazy evaluation

71

- **Record**: Represents a single variant site with coordinate properties and genotype access

72

- **Call**: Individual sample genotype calls with classification and analysis methods

73

- **Filters**: Pluggable filter system for quality control and variant selection

74

- **Writer**: Output handler preserving VCF format integrity and metadata

75

76

This design enables efficient processing of large genomic datasets while providing comprehensive access to variant information, sample genotypes, and metadata for bioinformatics workflows.

77

78

## Capabilities

79

80

### VCF File Parsing

81

82

Core functionality for reading VCF files with comprehensive metadata support, automatic type conversion, and streaming iteration through variant records.

83

84

```python { .api }

85

class Reader:

86

def __init__(self, fsock=None, filename=None, compressed=None,

87

prepend_chr=False, strict_whitespace=False, encoding='ascii'): ...

88

def __iter__(self): ...

89

def fetch(self, chrom, start=None, end=None): ...

90

91

class VCFReader: # Alias for Reader

92

pass

93

```

94

95

[VCF File Parsing](./vcf-parsing.md)

96

97

### VCF File Writing

98

99

Functionality for writing VCF records to files while preserving metadata and format integrity.

100

101

```python { .api }

102

class Writer:

103

def __init__(self, stream, template, lineterminator="\n"): ...

104

def write_record(self, record): ...

105

def flush(self): ...

106

def close(self): ...

107

108

class VCFWriter: # Alias for Writer

109

pass

110

```

111

112

[VCF File Writing](./vcf-writing.md)

113

114

### Variant Record Analysis

115

116

Comprehensive variant record representation with coordinate properties, variant classification, and population genetics statistics.

117

118

```python { .api }

119

class _Record:

120

# Standard VCF fields

121

CHROM: str

122

POS: int

123

ID: str

124

REF: str

125

ALT: list

126

QUAL: float

127

FILTER: list

128

INFO: dict

129

FORMAT: str

130

samples: list

131

132

# Coordinate properties

133

start: int

134

end: int

135

affected_start: int

136

affected_end: int

137

alleles: list

138

139

# Variant classification

140

is_snp: bool

141

is_indel: bool

142

is_sv: bool

143

var_type: str

144

var_subtype: str

145

146

# Population statistics

147

call_rate: float

148

aaf: list

149

heterozygosity: float

150

151

def genotype(self, name: str): ...

152

def get_hom_refs(self): ...

153

def get_hom_alts(self): ...

154

def get_hets(self): ...

155

```

156

157

[Variant Record Analysis](./variant-records.md)

158

159

### Sample Genotype Analysis

160

161

Individual sample genotype calls with classification, phase information, and variant analysis methods.

162

163

```python { .api }

164

class _Call:

165

site: '_Record'

166

sample: str

167

data: object

168

called: bool

169

gt_nums: str

170

gt_alleles: list

171

ploidity: int

172

gt_bases: str

173

gt_type: int # 0=hom_ref, 1=het, 2=hom_alt, None=uncalled

174

phased: bool

175

is_variant: bool

176

is_het: bool

177

```

178

179

[Sample Genotype Analysis](./genotype-analysis.md)

180

181

### VCF Filtering

182

183

Extensible filtering system with built-in filters for quality control and custom filter development.

184

185

```python { .api }

186

class Base: # Base filter class (imported as Filter)

187

name: str

188

def __call__(self, record): ...

189

def filter_name(self): ...

190

191

class SiteQuality(Base): ...

192

class VariantGenotypeQuality(Base): ...

193

class DepthPerSample(Base): ...

194

class SnpOnly(Base): ...

195

```

196

197

[VCF Filtering](./vcf-filtering.md)

198

199

### Sample-Based Filtering

200

201

Filter VCF files by sample during parsing to create subset files with specific samples.

202

203

```python { .api }

204

class SampleFilter:

205

def __init__(self, infile, outfile=None, filters=None, invert=False): ...

206

def set_filters(self, filters=None, invert=False): ...

207

def write(self, outfile=None): ...

208

```

209

210

[Sample-Based Filtering](./sample-filtering.md)

211

212

### VCF Utilities

213

214

Utility functions for advanced VCF operations including multi-file synchronization and sequence manipulation.

215

216

```python { .api }

217

def walk_together(*readers, **kwargs): ... # Synchronize multiple VCF files

218

def trim_common_suffix(*sequences): ... # Sequence manipulation utilities

219

```

220

221

[VCF Utilities](./utils.md)

222

223

### Constants and Reserved Fields

224

225

VCF specification constants, reserved field definitions, and metadata handling utilities.

226

227

```python { .api }

228

VERSION: str # PyVCF version

229

230

RESERVED_INFO: dict # Reserved INFO field definitions from VCF spec

231

RESERVED_FORMAT: dict # Reserved FORMAT field definitions from VCF spec

232

field_counts: dict # Field number interpretation constants

233

```

234

235

[Constants and Reserved Fields](./constants.md)