0
# VCF File Parsing
1
2
Comprehensive VCF file reading capabilities with streaming iteration, metadata extraction, and tabix support for efficient genomic data processing.
3
4
## Capabilities
5
6
### VCF Reader
7
8
The main VCF file parser providing streaming access to variant records with comprehensive metadata support.
9
10
```python { .api }
11
class Reader:
12
def __init__(self, fsock=None, filename=None, compressed=None,
13
prepend_chr=False, strict_whitespace=False, encoding='ascii'):
14
"""
15
Initialize VCF reader from file or stream.
16
17
Parameters:
18
- fsock: file-like object, open file handle
19
- filename: str, path to VCF file
20
- compressed: bool, whether file is gzip compressed (auto-detected)
21
- prepend_chr: bool, add 'chr' prefix to chromosome names
22
- strict_whitespace: bool, strict whitespace parsing
23
- encoding: str, file encoding (default 'ascii')
24
"""
25
26
def __iter__(self):
27
"""Iterator interface returning _Record objects."""
28
29
def __next__(self):
30
"""Get next variant record (Python 3.x iterator protocol)."""
31
32
def next(self):
33
"""Get next variant record (Python 2.x compatibility)."""
34
35
def fetch(self, chrom, start=None, end=None):
36
"""
37
Tabix-based region queries (requires pysam and indexed file).
38
39
Parameters:
40
- chrom: str, chromosome name
41
- start: int, start position (0-based, optional)
42
- end: int, end position (0-based, optional)
43
44
Returns:
45
Iterator of _Record objects in region
46
"""
47
```
48
49
### Reader Properties
50
51
Access to parsed VCF header metadata and file information.
52
53
```python { .api }
54
# Reader properties
55
metadata: dict # Complete header metadata (OrderedDict)
56
infos: dict # INFO field definitions (OrderedDict of _Info objects)
57
filters: dict # FILTER field definitions (OrderedDict of _Filter objects)
58
formats: dict # FORMAT field definitions (OrderedDict of _Format objects)
59
alts: dict # ALT field definitions (OrderedDict of _Alt objects)
60
contigs: dict # Contig information (OrderedDict of _Contig objects)
61
samples: list # Sample names from header
62
filename: str # Input filename if provided
63
encoding: str # File encoding used
64
```
65
66
### Backwards Compatibility
67
68
```python { .api }
69
class VCFReader:
70
"""Alias for Reader class for backwards compatibility."""
71
pass
72
```
73
74
### Usage Examples
75
76
```python
77
import vcf
78
79
# Basic file reading
80
reader = vcf.Reader(filename='variants.vcf')
81
for record in reader:
82
print(f"Variant at {record.CHROM}:{record.POS}")
83
84
# Reading from compressed file
85
reader = vcf.Reader(filename='variants.vcf.gz')
86
87
# Reading from file handle
88
with open('variants.vcf', 'r') as f:
89
reader = vcf.Reader(fsock=f)
90
91
# Access header information
92
reader = vcf.Reader(filename='variants.vcf')
93
print("Samples:", reader.samples)
94
print("INFO fields:", list(reader.infos.keys()))
95
96
# Tabix region queries (requires pysam and indexed file)
97
reader = vcf.Reader(filename='variants.vcf.gz')
98
for record in reader.fetch('chr1', 1000000, 2000000):
99
print(f"Variant in region: {record.CHROM}:{record.POS}")
100
```
101
102
## Types
103
104
### Metadata Namedtuples
105
106
```python { .api }
107
class _Info:
108
"""INFO field metadata."""
109
id: str
110
num: str
111
type: str
112
desc: str
113
source: str
114
version: str
115
116
class _Filter:
117
"""FILTER field metadata."""
118
id: str
119
desc: str
120
121
class _Format:
122
"""FORMAT field metadata."""
123
id: str
124
num: str
125
type: str
126
desc: str
127
128
class _Contig:
129
"""Contig metadata."""
130
id: str
131
length: int
132
133
class _Alt:
134
"""ALT field metadata."""
135
id: str
136
desc: str
137
```