or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

constants.mdgenotype-analysis.mdindex.mdsample-filtering.mdutils.mdvariant-records.mdvcf-filtering.mdvcf-parsing.mdvcf-writing.md

vcf-parsing.mddocs/

0

# VCF File Parsing

1

2

Comprehensive VCF file reading capabilities with streaming iteration, metadata extraction, and tabix support for efficient genomic data processing.

3

4

## Capabilities

5

6

### VCF Reader

7

8

The main VCF file parser providing streaming access to variant records with comprehensive metadata support.

9

10

```python { .api }

11

class Reader:

12

def __init__(self, fsock=None, filename=None, compressed=None,

13

prepend_chr=False, strict_whitespace=False, encoding='ascii'):

14

"""

15

Initialize VCF reader from file or stream.

16

17

Parameters:

18

- fsock: file-like object, open file handle

19

- filename: str, path to VCF file

20

- compressed: bool, whether file is gzip compressed (auto-detected)

21

- prepend_chr: bool, add 'chr' prefix to chromosome names

22

- strict_whitespace: bool, strict whitespace parsing

23

- encoding: str, file encoding (default 'ascii')

24

"""

25

26

def __iter__(self):

27

"""Iterator interface returning _Record objects."""

28

29

def __next__(self):

30

"""Get next variant record (Python 3.x iterator protocol)."""

31

32

def next(self):

33

"""Get next variant record (Python 2.x compatibility)."""

34

35

def fetch(self, chrom, start=None, end=None):

36

"""

37

Tabix-based region queries (requires pysam and indexed file).

38

39

Parameters:

40

- chrom: str, chromosome name

41

- start: int, start position (0-based, optional)

42

- end: int, end position (0-based, optional)

43

44

Returns:

45

Iterator of _Record objects in region

46

"""

47

```

48

49

### Reader Properties

50

51

Access to parsed VCF header metadata and file information.

52

53

```python { .api }

54

# Reader properties

55

metadata: dict # Complete header metadata (OrderedDict)

56

infos: dict # INFO field definitions (OrderedDict of _Info objects)

57

filters: dict # FILTER field definitions (OrderedDict of _Filter objects)

58

formats: dict # FORMAT field definitions (OrderedDict of _Format objects)

59

alts: dict # ALT field definitions (OrderedDict of _Alt objects)

60

contigs: dict # Contig information (OrderedDict of _Contig objects)

61

samples: list # Sample names from header

62

filename: str # Input filename if provided

63

encoding: str # File encoding used

64

```

65

66

### Backwards Compatibility

67

68

```python { .api }

69

class VCFReader:

70

"""Alias for Reader class for backwards compatibility."""

71

pass

72

```

73

74

### Usage Examples

75

76

```python

77

import vcf

78

79

# Basic file reading

80

reader = vcf.Reader(filename='variants.vcf')

81

for record in reader:

82

print(f"Variant at {record.CHROM}:{record.POS}")

83

84

# Reading from compressed file

85

reader = vcf.Reader(filename='variants.vcf.gz')

86

87

# Reading from file handle

88

with open('variants.vcf', 'r') as f:

89

reader = vcf.Reader(fsock=f)

90

91

# Access header information

92

reader = vcf.Reader(filename='variants.vcf')

93

print("Samples:", reader.samples)

94

print("INFO fields:", list(reader.infos.keys()))

95

96

# Tabix region queries (requires pysam and indexed file)

97

reader = vcf.Reader(filename='variants.vcf.gz')

98

for record in reader.fetch('chr1', 1000000, 2000000):

99

print(f"Variant in region: {record.CHROM}:{record.POS}")

100

```

101

102

## Types

103

104

### Metadata Namedtuples

105

106

```python { .api }

107

class _Info:

108

"""INFO field metadata."""

109

id: str

110

num: str

111

type: str

112

desc: str

113

source: str

114

version: str

115

116

class _Filter:

117

"""FILTER field metadata."""

118

id: str

119

desc: str

120

121

class _Format:

122

"""FORMAT field metadata."""

123

id: str

124

num: str

125

type: str

126

desc: str

127

128

class _Contig:

129

"""Contig metadata."""

130

id: str

131

length: int

132

133

class _Alt:

134

"""ALT field metadata."""

135

id: str

136

desc: str

137

```