Package for reading, manipulating, and writing genomic data
npx @tessl/cli install tessl/pypi-pysam@0.23.0A comprehensive Python wrapper for the HTSlib library that provides facilities for reading, manipulating, and writing genomic data sets in standard bioinformatics formats. Pysam supports SAM/BAM/CRAM for sequence alignments, VCF/BCF for variant calls, and FASTA/FASTQ for sequences, along with tabix-indexed files and compressed formats.
pip install pysamimport pysamCommon specific imports:
from pysam import AlignmentFile, VariantFile, FastaFile, TabixFile, BGZFile
from pysam import samtools, bcftools # For command-line functionsimport pysam
# Reading SAM/BAM/CRAM alignment files
with pysam.AlignmentFile("example.bam", "rb") as samfile:
for read in samfile.fetch("chr1", 1000, 2000):
print(f"Read: {read.query_name}, Position: {read.reference_start}")
# Reading VCF/BCF variant files
with pysam.VariantFile("example.vcf") as vcffile:
for record in vcffile.fetch("chr1", 1000, 2000):
print(f"Variant at {record.pos}: {record.ref} -> {record.alts}")
# Reading FASTA files
with pysam.FastaFile("reference.fa") as fastafile:
sequence = fastafile.fetch("chr1", 1000, 2000)
print(f"Sequence: {sequence}")
# Command-line tool integration
pysam.sort("-o", "sorted.bam", "input.bam")
pysam.index("sorted.bam")
# BCFtools for variant processing
pysam.call("-mv", "-o", "calls.vcf", "pileup.bcf")
pysam.filter("-i", "QUAL>=20", "-o", "filtered.vcf", "calls.vcf")Pysam follows a modular architecture built around HTSlib's C API:
AlignmentFile, VariantFile, FastaFile, TabixFile) that provide Pythonic access to genomic file formatsAlignedSegment, VariantRecord, FastxRecord) representing individual entries with attribute accessGTFProxy, VCFProxy, BedProxy)This design enables efficient processing of large genomic datasets while maintaining Python's ease of use.
Read and write sequence alignment files with support for indexing, random access, and comprehensive metadata handling.
class AlignmentFile:
def __init__(self, filepath, mode, **kwargs): ...
def fetch(self, contig=None, start=None, stop=None): ...
def pileup(self, contig=None, start=None, stop=None): ...
class AlignedSegment:
query_name: str
reference_start: int
reference_end: int
query_sequence: str
query_qualities: listHandle variant call format files with full header support, sample data access, and filtering capabilities.
class VariantFile:
def __init__(self, filepath, mode="r", **kwargs): ...
def fetch(self, contig=None, start=None, stop=None): ...
class VariantRecord:
contig: str
pos: int
ref: str
alts: tuple
qual: floatAccess sequence files with both random access (FASTA with index) and streaming capabilities (FASTA/FASTQ).
class FastaFile:
def __init__(self, filename): ...
def fetch(self, reference, start=None, end=None): ...
class FastxFile:
def __init__(self, filename, mode="r"): ...
def __iter__(self): ...
class FastxRecord:
name: str
sequence: str
comment: str
quality: strAccess compressed, indexed genomic files with support for multiple formats (BED, GFF, GTF, VCF).
class TabixFile:
def __init__(self, filename, parser=None): ...
def fetch(self, reference, start=None, end=None, parser=None): ...
def tabix_index(filename, preset=None, **kwargs): ...
def tabix_compress(filename_in, filename_out, **kwargs): ...Handle block gzip compressed files commonly used in genomics.
class BGZFile:
def __init__(self, filepath, mode): ...
def read(self, size=-1): ...
def write(self, data): ...
def seek(self, offset, whence=0): ...Access samtools and bcftools functionality directly from Python with all subcommands available as functions.
def view(*args, **kwargs): ...
def sort(*args, **kwargs): ...
def index(*args, **kwargs): ...
def stats(*args, **kwargs): ...
def call(*args, **kwargs): ...
def merge(*args, **kwargs): ...Helper functions for quality score conversion, error handling, and genomic constants.
def qualitystring_to_array(s): ...
def array_to_qualitystring(a): ...
class SamtoolsError(Exception): ...
# CIGAR operations
CMATCH: int
CINS: int
CDEL: int
# SAM flags
FPAIRED: int
FUNMAP: int
FREVERSE: intPysam uses SamtoolsError for command-line tool errors and standard Python exceptions for file I/O and data access issues. Most file operations support context managers for proper resource cleanup.
fetch() with coordinates) for random access