or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

alignment-files.mdbgzf-files.mdcommand-tools.mdindex.mdsequence-files.mdtabix-files.mdutilities.mdvariant-files.md

utilities.mddocs/

0

# Utilities and Constants

1

2

Helper functions, error handling classes, and constants used throughout the pysam library for quality score conversion, string encoding, and genomic data processing.

3

4

## Capabilities

5

6

### HTSlib Verbosity Control

7

8

Functions for controlling HTSlib logging verbosity.

9

10

```python { .api }

11

def get_verbosity():

12

"""

13

Get current HTSlib verbosity level.

14

15

Returns:

16

int, current verbosity level

17

"""

18

19

def set_verbosity(verbosity):

20

"""

21

Set HTSlib verbosity level.

22

23

Parameters:

24

- verbosity: int, verbosity level (0=silent, 1=errors, 2=warnings, 3=info)

25

26

Returns:

27

int, previous verbosity level

28

"""

29

```

30

31

### Quality Score Utilities

32

33

Functions for converting between different quality score representations.

34

35

```python { .api }

36

def qualitystring_to_array(input_str, offset=33):

37

"""

38

Convert quality string to array of integers.

39

40

Parameters:

41

- input_str: str, quality string

42

- offset: int, ASCII offset (33 for Sanger, 64 for Illumina 1.3+)

43

44

Returns:

45

array, quality scores as integers

46

"""

47

48

def array_to_qualitystring(qualities, offset=33):

49

"""

50

Convert array of integers to quality string.

51

52

Parameters:

53

- qualities: array, quality scores as integers

54

- offset: int, ASCII offset (33 for Sanger, 64 for Illumina 1.3+)

55

56

Returns:

57

str, quality string

58

"""

59

60

def qualities_to_qualitystring(qualities, offset=33):

61

"""

62

Convert list of quality scores to quality string.

63

64

Parameters:

65

- qualities: list, quality scores

66

- offset: int, ASCII offset

67

68

Returns:

69

str, quality string

70

"""

71

```

72

73

### String Encoding Utilities

74

75

Functions for handling string encoding in genomic data.

76

77

```python { .api }

78

def get_encoding_error_handler():

79

"""

80

Get current string encoding error handler.

81

82

Returns:

83

str, error handler name ('strict', 'ignore', 'replace')

84

"""

85

86

def set_encoding_error_handler(errors):

87

"""

88

Set string encoding error handler.

89

90

Parameters:

91

- errors: str, error handler ('strict', 'ignore', 'replace')

92

"""

93

```

94

95

### Error Handling Classes

96

97

Exception classes for pysam operations.

98

99

```python { .api }

100

class SamtoolsError(Exception):

101

"""

102

Exception raised when samtools/bcftools commands fail.

103

104

Attributes:

105

- cmd: str, command that failed

106

- returncode: int, exit code

107

- stderr: str, error output

108

"""

109

110

def __init__(self, cmd, returncode, stderr=None):

111

"""

112

Initialize SamtoolsError.

113

114

Parameters:

115

- cmd: str, failed command

116

- returncode: int, exit code

117

- stderr: str, error output

118

"""

119

120

class PysamDispatcher:

121

"""

122

Command-line dispatcher for samtools/bcftools.

123

124

Handles execution of external commands and error reporting.

125

"""

126

127

def __init__(self, name):

128

"""

129

Initialize dispatcher.

130

131

Parameters:

132

- name: str, tool name ('samtools' or 'bcftools')

133

"""

134

135

def __call__(self, *args, **kwargs):

136

"""

137

Execute command.

138

139

Parameters:

140

*args: command arguments

141

**kwargs: execution options

142

143

Returns:

144

int, exit code

145

"""

146

147

class unquoted_str(str):

148

"""

149

String subclass for unquoted VCF header values.

150

151

Used to indicate that a header value should not be quoted

152

when writing VCF files.

153

"""

154

```

155

156

### CIGAR Operation Constants

157

158

Constants for CIGAR string operations in sequence alignments.

159

160

```python { .api }

161

# CIGAR operation constants

162

CMATCH: int # Match/mismatch (0)

163

CINS: int # Insertion (1)

164

CDEL: int # Deletion (2)

165

CREF_SKIP: int # Reference skip (3)

166

CSOFT_CLIP: int # Soft clipping (4)

167

CHARD_CLIP: int # Hard clipping (5)

168

CPAD: int # Padding (6)

169

CEQUAL: int # Sequence match (7)

170

CDIFF: int # Sequence mismatch (8)

171

CBACK: int # Back (9)

172

173

# CIGAR operations tuple

174

CIGAR_OPS: tuple # All CIGAR operation constants

175

```

176

177

### SAM Flag Constants

178

179

Constants for SAM flag bits indicating read properties.

180

181

```python { .api }

182

# SAM flag constants

183

FPAIRED: int # Template having multiple segments (0x1)

184

FPROPER_PAIR: int # Each segment properly aligned (0x2)

185

FUNMAP: int # Segment unmapped (0x4)

186

FMUNMAP: int # Next segment unmapped (0x8)

187

FREVERSE: int # SEQ reverse complemented (0x10)

188

FMREVERSE: int # SEQ of next segment reverse complemented (0x20)

189

FREAD1: int # First segment in template (0x40)

190

FREAD2: int # Last segment in template (0x80)

191

FSECONDARY: int # Secondary alignment (0x100)

192

FQCFAIL: int # Not passing quality controls (0x200)

193

FDUP: int # PCR or optical duplicate (0x400)

194

FSUPPLEMENTARY: int # Supplementary alignment (0x800)

195

196

# SAM flags tuple

197

SAM_FLAGS: tuple # All SAM flag constants

198

```

199

200

### Version Information

201

202

Access to library version information.

203

204

```python { .api }

205

__version__: str # Pysam version string

206

__samtools_version__: str # Underlying samtools version

207

__htslib_version__: str # Underlying htslib version

208

```

209

210

### Configuration Functions

211

212

Functions for getting pysam build configuration.

213

214

```python { .api }

215

def get_include():

216

"""

217

Get list of include directories for pysam headers.

218

219

Returns:

220

list, include directory paths

221

"""

222

223

def get_defines():

224

"""

225

Get list of compilation defines.

226

227

Returns:

228

list, compilation defines

229

"""

230

231

def get_libraries():

232

"""

233

Get list of pysam libraries for linking.

234

235

Returns:

236

list, library file paths

237

"""

238

```

239

240

## Usage Examples

241

242

### Quality Score Conversion

243

244

```python

245

import pysam

246

247

# Convert quality string to numeric scores

248

quality_string = "###$$%&'()*+,-./0123456789"

249

scores = pysam.qualitystring_to_array(quality_string)

250

print(f"Quality scores: {scores}")

251

252

# Convert back to string

253

converted_string = pysam.array_to_qualitystring(scores)

254

print(f"Converted string: {converted_string}")

255

assert quality_string == converted_string

256

257

# Handle different encoding offsets

258

illumina_quality = "BCDEFGHIJKLMNOP" # Illumina 1.3+ encoding

259

sanger_scores = pysam.qualitystring_to_array(illumina_quality, offset=64)

260

sanger_quality = pysam.array_to_qualitystring(sanger_scores, offset=33)

261

print(f"Illumina to Sanger: {illumina_quality} -> {sanger_quality}")

262

263

# Convert list of qualities

264

quality_list = [20, 25, 30, 35, 40]

265

quality_str = pysam.qualities_to_qualitystring(quality_list)

266

print(f"Quality list to string: {quality_list} -> {quality_str}")

267

```

268

269

### Working with CIGAR Operations

270

271

```python

272

import pysam

273

274

def analyze_cigar(cigar_tuples):

275

"""Analyze CIGAR operations in an alignment."""

276

operations = {

277

pysam.CMATCH: "Match/Mismatch",

278

pysam.CINS: "Insertion",

279

pysam.CDEL: "Deletion",

280

pysam.CREF_SKIP: "Reference Skip",

281

pysam.CSOFT_CLIP: "Soft Clip",

282

pysam.CHARD_CLIP: "Hard Clip",

283

pysam.CPAD: "Padding",

284

pysam.CEQUAL: "Sequence Match",

285

pysam.CDIFF: "Sequence Mismatch",

286

pysam.CBACK: "Back"

287

}

288

289

print("CIGAR analysis:")

290

for op, length in cigar_tuples:

291

op_name = operations.get(op, f"Unknown({op})")

292

print(f" {op_name}: {length} bases")

293

294

# Example usage with aligned segment

295

with pysam.AlignmentFile("example.bam", "rb") as bamfile:

296

for read in bamfile.fetch():

297

if read.cigar:

298

print(f"Read: {read.query_name}")

299

analyze_cigar(read.cigar)

300

break

301

```

302

303

### Working with SAM Flags

304

305

```python

306

import pysam

307

308

def analyze_sam_flags(flag):

309

"""Analyze SAM flags for a read."""

310

flags = {

311

pysam.FPAIRED: "Paired",

312

pysam.FPROPER_PAIR: "Proper pair",

313

pysam.FUNMAP: "Unmapped",

314

pysam.FMUNMAP: "Mate unmapped",

315

pysam.FREVERSE: "Reverse strand",

316

pysam.FMREVERSE: "Mate reverse strand",

317

pysam.FREAD1: "First in pair",

318

pysam.FREAD2: "Second in pair",

319

pysam.FSECONDARY: "Secondary alignment",

320

pysam.FQCFAIL: "QC fail",

321

pysam.FDUP: "Duplicate",

322

pysam.FSUPPLEMENTARY: "Supplementary"

323

}

324

325

print(f"SAM flag {flag} ({flag:b} binary):")

326

active_flags = []

327

for flag_bit, description in flags.items():

328

if flag & flag_bit:

329

active_flags.append(description)

330

331

if active_flags:

332

for desc in active_flags:

333

print(f" - {desc}")

334

else:

335

print(" - No flags set")

336

337

# Example usage

338

with pysam.AlignmentFile("example.bam", "rb") as bamfile:

339

for read in bamfile.fetch():

340

print(f"Read: {read.query_name}")

341

analyze_sam_flags(read.flag)

342

343

# Check specific flags using properties

344

if read.is_paired:

345

print(" This is a paired read")

346

if read.is_proper_pair:

347

print(" This is a proper pair")

348

if read.is_reverse:

349

print(" This read is reverse complemented")

350

351

break

352

```

353

354

### Error Handling

355

356

```python

357

import pysam

358

359

def safe_samtools_operation(command_func, *args, **kwargs):

360

"""Safely execute samtools command with error handling."""

361

try:

362

result = command_func(*args, **kwargs)

363

print(f"Command succeeded with exit code: {result}")

364

return result

365

except pysam.SamtoolsError as e:

366

print(f"Samtools command failed:")

367

print(f" Command: {e.cmd}")

368

print(f" Exit code: {e.returncode}")

369

if e.stderr:

370

print(f" Error: {e.stderr}")

371

return None

372

373

# Example usage

374

result = safe_samtools_operation(pysam.sort, "-o", "output.bam", "input.bam")

375

if result is not None:

376

print("Sort operation completed successfully")

377

```

378

379

### String Encoding Handling

380

381

```python

382

import pysam

383

384

# Check current encoding error handler

385

current_handler = pysam.get_encoding_error_handler()

386

print(f"Current encoding error handler: {current_handler}")

387

388

# Set more permissive error handling for problematic files

389

pysam.set_encoding_error_handler('ignore')

390

391

try:

392

# Process file that might have encoding issues

393

with pysam.AlignmentFile("problematic.bam", "rb") as bamfile:

394

for read in bamfile.fetch():

395

# Process reads, ignoring encoding errors in strings

396

pass

397

finally:

398

# Restore original handler

399

pysam.set_encoding_error_handler(current_handler)

400

```

401

402

### Version and Configuration Information

403

404

```python

405

import pysam

406

407

# Display version information

408

print(f"Pysam version: {pysam.__version__}")

409

print(f"Samtools version: {pysam.__samtools_version__}")

410

411

# Get build configuration

412

includes = pysam.get_include()

413

print(f"Include directories: {includes}")

414

415

defines = pysam.get_defines()

416

print(f"Compilation defines: {defines}")

417

418

libraries = pysam.get_libraries()

419

print(f"Libraries: {libraries}")

420

421

# Check available CIGAR operations

422

print(f"Available CIGAR operations: {pysam.CIGAR_OPS}")

423

424

# Check available SAM flags

425

print(f"Available SAM flags: {pysam.SAM_FLAGS}")

426

```

427

428

### Custom Quality Processing

429

430

```python

431

import pysam

432

import numpy as np

433

434

def filter_by_average_quality(input_bam, output_bam, min_avg_quality=20):

435

"""Filter reads by average base quality."""

436

437

with pysam.AlignmentFile(input_bam, "rb") as infile:

438

with pysam.AlignmentFile(output_bam, "wb", template=infile) as outfile:

439

440

reads_processed = 0

441

reads_kept = 0

442

443

for read in infile:

444

reads_processed += 1

445

446

if read.query_qualities:

447

# Calculate average quality

448

avg_quality = np.mean(read.query_qualities)

449

450

if avg_quality >= min_avg_quality:

451

outfile.write(read)

452

reads_kept += 1

453

else:

454

# Keep reads without quality scores

455

outfile.write(read)

456

reads_kept += 1

457

458

print(f"Processed {reads_processed} reads, kept {reads_kept} reads")

459

print(f"Filtering rate: {(reads_processed - reads_kept) / reads_processed * 100:.1f}%")

460

461

# Usage

462

filter_by_average_quality("input.bam", "filtered.bam", min_avg_quality=25)

463

```

464

465

## Performance and Best Practices

466

467

- Quality score conversions are optimized for speed with Cython implementations

468

- Use appropriate offset values for different sequencing platforms

469

- CIGAR and SAM flag constants enable readable code instead of magic numbers

470

- Error handlers can be adjusted for problematic input files

471

- Version information helps ensure compatibility across different installations

472

- Configuration functions are useful for building extensions that depend on pysam