or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-alphabase

An infrastructure Python package of the AlphaX ecosystem for MS proteomics

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/alphabase@1.6.x

To install, run

npx @tessl/cli install tessl/pypi-alphabase@1.6.0

0

# AlphaBase

1

2

An infrastructure Python package for the AlphaX ecosystem that provides essential functionalities for mass spectrometry (MS) proteomics. AlphaBase serves as the foundational library for peptide and protein analysis, spectral library management, PSM (Peptide-Spectrum Match) reading, quantification workflows, and data processing utilities across multiple MS data formats.

3

4

## Package Information

5

6

- **Package Name**: alphabase

7

- **Language**: Python

8

- **Installation**: `pip install alphabase`

9

- **Version**: 1.6.2

10

- **License**: Apache-2.0

11

12

## Core Imports

13

14

```python

15

import alphabase

16

```

17

18

Common patterns for working with specific modules:

19

20

```python

21

# Chemical constants and calculations

22

from alphabase.constants.aa import AA_ASCII_MASS, calc_AA_masses

23

from alphabase.constants.atom import MASS_PROTON, calc_mass_from_formula

24

from alphabase.constants.modification import MOD_DF, add_new_modifications

25

26

# Fragment and precursor calculations

27

from alphabase.peptide.fragment import get_charged_frag_types, create_fragment_mz_dataframe

28

from alphabase.peptide.precursor import update_precursor_mz, calc_precursor_isotope_info

29

from alphabase.peptide.mobility import ccs_to_mobility_for_df, mobility_to_ccs_for_df

30

31

# Spectral library operations

32

from alphabase.spectral_library.base import SpecLibBase

33

from alphabase.spectral_library.decoy import SpecLibDecoy, DIANNDecoyGenerator

34

from alphabase.spectral_library.flat import SpecLibFlat

35

36

# PSM reading from various search engines

37

from alphabase.psm_reader import MaxQuantReader, DiannReader, SpectronautReader

38

39

# Quantification data processing

40

from alphabase.quantification.quant_reader.quant_reader_manager import import_data

41

from alphabase.quantification.quant_reader.longformat_reader import LongFormatReader

42

43

# SMILES and cheminformatics

44

from alphabase.smiles.peptide import PeptideSmilesEncoder

45

from alphabase.smiles.smiles import AminoAcidModifier

46

47

# High-performance I/O

48

from alphabase.io.hdf import HDF_File

49

from alphabase.io.tempmmap import array, zeros

50

```

51

52

## Basic Usage

53

54

```python

55

import pandas as pd

56

from alphabase.constants.aa import calc_AA_masses

57

from alphabase.constants.modification import calc_modification_mass

58

from alphabase.peptide.fragment import create_fragment_mz_dataframe

59

from alphabase.spectral_library.base import SpecLibBase

60

61

# Calculate amino acid masses for peptide sequences

62

sequences = ['PEPTIDE', 'SEQUENCE', 'EXAMPLE']

63

aa_masses = calc_AA_masses(sequences)

64

65

# Calculate modification masses

66

mod_sequences = ['PEPTIDE[Oxidation (M)]', 'SEQUENCE[Phospho (STY)]']

67

mod_masses = calc_modification_mass(mod_sequences)

68

69

# Create a spectral library

70

spec_lib = SpecLibBase()

71

72

# Load precursor data

73

precursor_df = pd.DataFrame({

74

'sequence': ['PEPTIDE', 'SEQUENCE'],

75

'mods': ['', 'Phospho (STY)@2'],

76

'charge': [2, 3],

77

'proteins': ['P12345', 'P67890']

78

})

79

80

spec_lib.precursor_df = precursor_df

81

spec_lib.refine_df()

82

83

# Calculate precursor m/z values

84

spec_lib.calc_precursor_mz()

85

86

# Generate fragment m/z dataframe

87

frag_types = ['b++', 'y++', 'b+', 'y+']

88

spec_lib.calc_fragment_mz_df(frag_types)

89

90

print(f"Created spectral library with {len(spec_lib.precursor_df)} precursors")

91

```

92

93

## Architecture

94

95

AlphaBase is organized into functional modules that provide both high-level object-oriented interfaces and low-level array operations:

96

97

- **Constants**: Chemical databases (amino acids, elements, modifications, isotopes) with fast lookup tables

98

- **Peptide Processing**: Mass calculations, fragment ion generation, precursor analysis, ion mobility, and advanced algorithmic operations

99

- **Spectral Libraries**: Full-featured spectral library management with filtering, processing, I/O, decoy generation, and format conversion

100

- **PSM Readers**: Unified interface for reading outputs from 10+ proteomics search engines

101

- **Quantification**: Multi-format quantification readers, data reformatters, and processing pipelines for various proteomics platforms

102

- **SMILES Chemistry**: Cheminformatics capabilities for chemical structure representation and property prediction

103

- **I/O Utilities**: Advanced HDF5 wrapper and memory-mapped arrays for high-performance data processing

104

- **Protein Analysis**: FASTA processing, protein digestion, inference workflows, and sequence analysis

105

106

This modular design enables both rapid prototyping and high-throughput production workflows in mass spectrometry proteomics, with comprehensive coverage from raw data processing to advanced computational analysis.

107

108

## Capabilities

109

110

### Chemical Constants and Calculations

111

112

Comprehensive databases of amino acids, chemical elements, modifications, and isotopes with vectorized mass calculations. Provides the foundation for all proteomics calculations with pre-computed lookup tables for performance.

113

114

```python { .api }

115

# Core constants

116

AA_ASCII_MASS: np.ndarray # 128-length array of AA masses

117

MASS_PROTON: float = 1.00727646688

118

MOD_DF: pd.DataFrame # Complete modification database

119

120

# Mass calculation functions

121

def calc_AA_masses(sequences: List[str]) -> np.ndarray: ...

122

def calc_mass_from_formula(formula: str) -> float: ...

123

def calc_modification_mass(mod_sequences: List[str]) -> np.ndarray: ...

124

```

125

126

[Chemical Constants](./chemical-constants.md)

127

128

### Fragment Ion Generation

129

130

Complete fragment ion series generation with support for multiple fragment types, neutral losses, and charge states. Enables creation of theoretical spectra for spectral library construction and peptide identification.

131

132

```python { .api }

133

def get_charged_frag_types(frag_types: List[str], charges: List[int]) -> List[str]: ...

134

def create_fragment_mz_dataframe(precursor_df: pd.DataFrame, frag_types: List[str]) -> pd.DataFrame: ...

135

def calc_b_y_and_peptide_masses_for_same_len_seqs(sequences: List[str]) -> tuple: ...

136

```

137

138

[Fragment Ions](./fragment-ions.md)

139

140

### Spectral Library Management

141

142

Full-featured spectral library class with comprehensive functionality for loading, processing, filtering, and exporting spectral libraries. Supports multiple formats and advanced operations like decoy generation and isotope calculations.

143

144

```python { .api }

145

class SpecLibBase:

146

precursor_df: pd.DataFrame

147

fragment_mz_df: pd.DataFrame

148

fragment_intensity_df: pd.DataFrame

149

150

def copy(self) -> 'SpecLibBase': ...

151

def append(self, other: 'SpecLibBase') -> None: ...

152

def calc_precursor_mz(self) -> None: ...

153

def calc_fragment_mz_df(self, frag_types: List[str]) -> None: ...

154

def save_hdf(self, filepath: str) -> None: ...

155

def load_hdf(self, filepath: str) -> None: ...

156

```

157

158

[Spectral Libraries](./spectral-libraries.md)

159

160

### PSM Reading and Processing

161

162

Unified interface for reading Peptide-Spectrum Match (PSM) files from multiple proteomics search engines. Standardizes column mappings and data formats across different tools for seamless data integration.

163

164

```python { .api }

165

class PSMReaderBase:

166

def import_file(self, filepath: str) -> pd.DataFrame: ...

167

def get_modification_mapping(self) -> dict: ...

168

169

# Available readers

170

class MaxQuantReader(PSMReaderBase): ...

171

class DiannReader(PSMReaderBase): ...

172

class SpectronautReader(PSMReaderBase): ...

173

# ... and 7 more search engine readers

174

```

175

176

[PSM Readers](./psm-readers.md)

177

178

### High-Performance I/O

179

180

Advanced I/O utilities including HDF5 wrapper with attribute-style access and memory-mapped arrays for efficient handling of large proteomics datasets. Optimized for high-throughput workflows and memory efficiency.

181

182

```python { .api }

183

class HDF_File:

184

def __init__(self, filepath: str, mode: str = 'r'): ...

185

def __getitem__(self, key: str): ...

186

def __setitem__(self, key: str, value): ...

187

188

def array(shape: tuple, dtype=np.float64) -> np.ndarray: ...

189

def zeros(shape: tuple, dtype=np.float64) -> np.ndarray: ...

190

def clear() -> None: ...

191

```

192

193

[I/O Utilities](./io-utilities.md)

194

195

### Quantification Data Processing

196

197

Comprehensive quantification data processing capabilities for handling multi-format quantified peptide and protein data from various proteomics platforms. Provides unified interfaces for reading, reformatting, and processing quantification results from DIA-NN, Spectronaut, MaxQuant, and other proteomics tools.

198

199

```python { .api }

200

def import_data(data_path: str, data_type: str = None, config_dict: dict = None) -> pd.DataFrame: ...

201

class LongFormatReader: ...

202

class WideFormatReader: ...

203

class ConfigDictLoader: ...

204

```

205

206

[Quantification](./quantification.md)

207

208

### Advanced Peptide Operations

209

210

Comprehensive peptide processing capabilities including precursor calculations, mass calculations, ion mobility transformations, and advanced algorithmic operations. Provides high-performance functions for large-scale peptide analysis, isotope modeling, and multi-dimensional separations integration.

211

212

```python { .api }

213

def update_precursor_mz(precursor_df: pd.DataFrame, batch_size: int = 100000) -> None: ...

214

def calc_precursor_isotope_info(precursor_df: pd.DataFrame, max_isotope: int = 6) -> None: ...

215

def ccs_to_mobility_for_df(precursor_df: pd.DataFrame, vendor_type: str = 'bruker') -> None: ...

216

def hash_precursor_df(precursor_df: pd.DataFrame, seed: int = 42) -> None: ...

217

```

218

219

[Advanced Peptide Operations](./advanced-peptide-operations.md)

220

221

### Advanced Spectral Library Operations

222

223

Extended spectral library functionality including decoy generation, format conversion, library validation, and specialized library formats. Provides comprehensive tools for spectral library manipulation, quality control, and integration with various proteomics workflows and search engines.

224

225

```python { .api }

226

class SpecLibDecoy: ...

227

class DIANNDecoyGenerator: ...

228

class SpecLibFlat: ...

229

class LibraryReaderBase: ...

230

class Schema: ...

231

```

232

233

[Advanced Spectral Libraries](./advanced-spectral-libraries.md)

234

235

### SMILES and Chemical Representations

236

237

Comprehensive cheminformatics capabilities for peptide and amino acid SMILES (Simplified Molecular-Input Line-Entry System) representations. Provides tools for chemical structure encoding, modification representation, and integration with computational chemistry workflows in proteomics.

238

239

```python { .api }

240

class AminoAcidModifier: ...

241

class PeptideSmilesEncoder: ...

242

def calculate_molecular_descriptors(smiles: str) -> dict: ...

243

def predict_retention_time_from_smiles(smiles: str, model_type: str = 'krokhin') -> float: ...

244

```

245

246

[SMILES Chemistry](./smiles-chemistry.md)

247

248

### Protein Analysis

249

250

FASTA file processing, protein sequence analysis, and enzymatic digestion utilities. Supports protein inference workflows and integration with proteomics identification pipelines.

251

252

```python { .api }

253

def read_fasta_file(filepath: str) -> Iterator[tuple[str, str]]: ...

254

def get_uniprot_gene_name(description: str) -> str: ...

255

```

256

257

[Protein Analysis](./protein-analysis.md)