A VCFv4.0 and 4.1 parser for Python
npx @tessl/cli install tessl/pypi-pyvcf@0.6.00
# PyVCF
1
2
A comprehensive Python library for parsing and manipulating Variant Call Format (VCF) files v4.0 and 4.1. PyVCF provides a CSV-like interface for reading genomic variant data with automatic type conversion, comprehensive record access, and extensive filtering capabilities for bioinformatics applications.
3
4
## Package Information
5
6
- **Package Name**: PyVCF
7
- **Language**: Python
8
- **Installation**: `pip install pyvcf`
9
10
## Core Imports
11
12
```python
13
import vcf
14
```
15
16
Common imports for VCF parsing:
17
18
```python
19
from vcf import Reader, Writer
20
```
21
22
Alternative imports:
23
24
```python
25
from vcf import VCFReader, VCFWriter # Backwards compatibility aliases
26
```
27
28
Additional imports for filtering and utilities:
29
30
```python
31
from vcf import Filter # Base filter class (actually vcf.filters.Base)
32
from vcf.filters import SiteQuality, DepthPerSample, SnpOnly
33
from vcf.sample_filter import SampleFilter
34
from vcf.utils import walk_together, trim_common_suffix
35
from vcf import RESERVED_INFO, RESERVED_FORMAT # Constants
36
```
37
38
## Basic Usage
39
40
```python
41
import vcf
42
43
# Read a VCF file
44
reader = vcf.Reader(filename='variants.vcf')
45
46
# Iterate through records
47
for record in reader:
48
print(f"Chr: {record.CHROM}, Pos: {record.POS}")
49
print(f"Ref: {record.REF}, Alt: {record.ALT}")
50
51
# Access sample genotypes
52
for sample_call in record.samples:
53
print(f"Sample {sample_call.sample}: {sample_call.gt_bases}")
54
55
# Write a VCF file
56
input_reader = vcf.Reader(filename='input.vcf')
57
writer = vcf.Writer(open('output.vcf', 'w'), input_reader)
58
59
for record in input_reader:
60
if record.QUAL and record.QUAL > 30: # Filter by quality
61
writer.write_record(record)
62
63
writer.close()
64
```
65
66
## Architecture
67
68
PyVCF uses a structured approach to VCF parsing:
69
70
- **Reader**: Streaming VCF parser that returns Record objects with lazy evaluation
71
- **Record**: Represents a single variant site with coordinate properties and genotype access
72
- **Call**: Individual sample genotype calls with classification and analysis methods
73
- **Filters**: Pluggable filter system for quality control and variant selection
74
- **Writer**: Output handler preserving VCF format integrity and metadata
75
76
This design enables efficient processing of large genomic datasets while providing comprehensive access to variant information, sample genotypes, and metadata for bioinformatics workflows.
77
78
## Capabilities
79
80
### VCF File Parsing
81
82
Core functionality for reading VCF files with comprehensive metadata support, automatic type conversion, and streaming iteration through variant records.
83
84
```python { .api }
85
class Reader:
86
def __init__(self, fsock=None, filename=None, compressed=None,
87
prepend_chr=False, strict_whitespace=False, encoding='ascii'): ...
88
def __iter__(self): ...
89
def fetch(self, chrom, start=None, end=None): ...
90
91
class VCFReader: # Alias for Reader
92
pass
93
```
94
95
[VCF File Parsing](./vcf-parsing.md)
96
97
### VCF File Writing
98
99
Functionality for writing VCF records to files while preserving metadata and format integrity.
100
101
```python { .api }
102
class Writer:
103
def __init__(self, stream, template, lineterminator="\n"): ...
104
def write_record(self, record): ...
105
def flush(self): ...
106
def close(self): ...
107
108
class VCFWriter: # Alias for Writer
109
pass
110
```
111
112
[VCF File Writing](./vcf-writing.md)
113
114
### Variant Record Analysis
115
116
Comprehensive variant record representation with coordinate properties, variant classification, and population genetics statistics.
117
118
```python { .api }
119
class _Record:
120
# Standard VCF fields
121
CHROM: str
122
POS: int
123
ID: str
124
REF: str
125
ALT: list
126
QUAL: float
127
FILTER: list
128
INFO: dict
129
FORMAT: str
130
samples: list
131
132
# Coordinate properties
133
start: int
134
end: int
135
affected_start: int
136
affected_end: int
137
alleles: list
138
139
# Variant classification
140
is_snp: bool
141
is_indel: bool
142
is_sv: bool
143
var_type: str
144
var_subtype: str
145
146
# Population statistics
147
call_rate: float
148
aaf: list
149
heterozygosity: float
150
151
def genotype(self, name: str): ...
152
def get_hom_refs(self): ...
153
def get_hom_alts(self): ...
154
def get_hets(self): ...
155
```
156
157
[Variant Record Analysis](./variant-records.md)
158
159
### Sample Genotype Analysis
160
161
Individual sample genotype calls with classification, phase information, and variant analysis methods.
162
163
```python { .api }
164
class _Call:
165
site: '_Record'
166
sample: str
167
data: object
168
called: bool
169
gt_nums: str
170
gt_alleles: list
171
ploidity: int
172
gt_bases: str
173
gt_type: int # 0=hom_ref, 1=het, 2=hom_alt, None=uncalled
174
phased: bool
175
is_variant: bool
176
is_het: bool
177
```
178
179
[Sample Genotype Analysis](./genotype-analysis.md)
180
181
### VCF Filtering
182
183
Extensible filtering system with built-in filters for quality control and custom filter development.
184
185
```python { .api }
186
class Base: # Base filter class (imported as Filter)
187
name: str
188
def __call__(self, record): ...
189
def filter_name(self): ...
190
191
class SiteQuality(Base): ...
192
class VariantGenotypeQuality(Base): ...
193
class DepthPerSample(Base): ...
194
class SnpOnly(Base): ...
195
```
196
197
[VCF Filtering](./vcf-filtering.md)
198
199
### Sample-Based Filtering
200
201
Filter VCF files by sample during parsing to create subset files with specific samples.
202
203
```python { .api }
204
class SampleFilter:
205
def __init__(self, infile, outfile=None, filters=None, invert=False): ...
206
def set_filters(self, filters=None, invert=False): ...
207
def write(self, outfile=None): ...
208
```
209
210
[Sample-Based Filtering](./sample-filtering.md)
211
212
### VCF Utilities
213
214
Utility functions for advanced VCF operations including multi-file synchronization and sequence manipulation.
215
216
```python { .api }
217
def walk_together(*readers, **kwargs): ... # Synchronize multiple VCF files
218
def trim_common_suffix(*sequences): ... # Sequence manipulation utilities
219
```
220
221
[VCF Utilities](./utils.md)
222
223
### Constants and Reserved Fields
224
225
VCF specification constants, reserved field definitions, and metadata handling utilities.
226
227
```python { .api }
228
VERSION: str # PyVCF version
229
230
RESERVED_INFO: dict # Reserved INFO field definitions from VCF spec
231
RESERVED_FORMAT: dict # Reserved FORMAT field definitions from VCF spec
232
field_counts: dict # Field number interpretation constants
233
```
234
235
[Constants and Reserved Fields](./constants.md)