An infrastructure Python package of the AlphaX ecosystem for MS proteomics
npx @tessl/cli install tessl/pypi-alphabase@1.6.00
# AlphaBase
1
2
An infrastructure Python package for the AlphaX ecosystem that provides essential functionalities for mass spectrometry (MS) proteomics. AlphaBase serves as the foundational library for peptide and protein analysis, spectral library management, PSM (Peptide-Spectrum Match) reading, quantification workflows, and data processing utilities across multiple MS data formats.
3
4
## Package Information
5
6
- **Package Name**: alphabase
7
- **Language**: Python
8
- **Installation**: `pip install alphabase`
9
- **Version**: 1.6.2
10
- **License**: Apache-2.0
11
12
## Core Imports
13
14
```python
15
import alphabase
16
```
17
18
Common patterns for working with specific modules:
19
20
```python
21
# Chemical constants and calculations
22
from alphabase.constants.aa import AA_ASCII_MASS, calc_AA_masses
23
from alphabase.constants.atom import MASS_PROTON, calc_mass_from_formula
24
from alphabase.constants.modification import MOD_DF, add_new_modifications
25
26
# Fragment and precursor calculations
27
from alphabase.peptide.fragment import get_charged_frag_types, create_fragment_mz_dataframe
28
from alphabase.peptide.precursor import update_precursor_mz, calc_precursor_isotope_info
29
from alphabase.peptide.mobility import ccs_to_mobility_for_df, mobility_to_ccs_for_df
30
31
# Spectral library operations
32
from alphabase.spectral_library.base import SpecLibBase
33
from alphabase.spectral_library.decoy import SpecLibDecoy, DIANNDecoyGenerator
34
from alphabase.spectral_library.flat import SpecLibFlat
35
36
# PSM reading from various search engines
37
from alphabase.psm_reader import MaxQuantReader, DiannReader, SpectronautReader
38
39
# Quantification data processing
40
from alphabase.quantification.quant_reader.quant_reader_manager import import_data
41
from alphabase.quantification.quant_reader.longformat_reader import LongFormatReader
42
43
# SMILES and cheminformatics
44
from alphabase.smiles.peptide import PeptideSmilesEncoder
45
from alphabase.smiles.smiles import AminoAcidModifier
46
47
# High-performance I/O
48
from alphabase.io.hdf import HDF_File
49
from alphabase.io.tempmmap import array, zeros
50
```
51
52
## Basic Usage
53
54
```python
55
import pandas as pd
56
from alphabase.constants.aa import calc_AA_masses
57
from alphabase.constants.modification import calc_modification_mass
58
from alphabase.peptide.fragment import create_fragment_mz_dataframe
59
from alphabase.spectral_library.base import SpecLibBase
60
61
# Calculate amino acid masses for peptide sequences
62
sequences = ['PEPTIDE', 'SEQUENCE', 'EXAMPLE']
63
aa_masses = calc_AA_masses(sequences)
64
65
# Calculate modification masses
66
mod_sequences = ['PEPTIDE[Oxidation (M)]', 'SEQUENCE[Phospho (STY)]']
67
mod_masses = calc_modification_mass(mod_sequences)
68
69
# Create a spectral library
70
spec_lib = SpecLibBase()
71
72
# Load precursor data
73
precursor_df = pd.DataFrame({
74
'sequence': ['PEPTIDE', 'SEQUENCE'],
75
'mods': ['', 'Phospho (STY)@2'],
76
'charge': [2, 3],
77
'proteins': ['P12345', 'P67890']
78
})
79
80
spec_lib.precursor_df = precursor_df
81
spec_lib.refine_df()
82
83
# Calculate precursor m/z values
84
spec_lib.calc_precursor_mz()
85
86
# Generate fragment m/z dataframe
87
frag_types = ['b++', 'y++', 'b+', 'y+']
88
spec_lib.calc_fragment_mz_df(frag_types)
89
90
print(f"Created spectral library with {len(spec_lib.precursor_df)} precursors")
91
```
92
93
## Architecture
94
95
AlphaBase is organized into functional modules that provide both high-level object-oriented interfaces and low-level array operations:
96
97
- **Constants**: Chemical databases (amino acids, elements, modifications, isotopes) with fast lookup tables
98
- **Peptide Processing**: Mass calculations, fragment ion generation, precursor analysis, ion mobility, and advanced algorithmic operations
99
- **Spectral Libraries**: Full-featured spectral library management with filtering, processing, I/O, decoy generation, and format conversion
100
- **PSM Readers**: Unified interface for reading outputs from 10+ proteomics search engines
101
- **Quantification**: Multi-format quantification readers, data reformatters, and processing pipelines for various proteomics platforms
102
- **SMILES Chemistry**: Cheminformatics capabilities for chemical structure representation and property prediction
103
- **I/O Utilities**: Advanced HDF5 wrapper and memory-mapped arrays for high-performance data processing
104
- **Protein Analysis**: FASTA processing, protein digestion, inference workflows, and sequence analysis
105
106
This modular design enables both rapid prototyping and high-throughput production workflows in mass spectrometry proteomics, with comprehensive coverage from raw data processing to advanced computational analysis.
107
108
## Capabilities
109
110
### Chemical Constants and Calculations
111
112
Comprehensive databases of amino acids, chemical elements, modifications, and isotopes with vectorized mass calculations. Provides the foundation for all proteomics calculations with pre-computed lookup tables for performance.
113
114
```python { .api }
115
# Core constants
116
AA_ASCII_MASS: np.ndarray # 128-length array of AA masses
117
MASS_PROTON: float = 1.00727646688
118
MOD_DF: pd.DataFrame # Complete modification database
119
120
# Mass calculation functions
121
def calc_AA_masses(sequences: List[str]) -> np.ndarray: ...
122
def calc_mass_from_formula(formula: str) -> float: ...
123
def calc_modification_mass(mod_sequences: List[str]) -> np.ndarray: ...
124
```
125
126
[Chemical Constants](./chemical-constants.md)
127
128
### Fragment Ion Generation
129
130
Complete fragment ion series generation with support for multiple fragment types, neutral losses, and charge states. Enables creation of theoretical spectra for spectral library construction and peptide identification.
131
132
```python { .api }
133
def get_charged_frag_types(frag_types: List[str], charges: List[int]) -> List[str]: ...
134
def create_fragment_mz_dataframe(precursor_df: pd.DataFrame, frag_types: List[str]) -> pd.DataFrame: ...
135
def calc_b_y_and_peptide_masses_for_same_len_seqs(sequences: List[str]) -> tuple: ...
136
```
137
138
[Fragment Ions](./fragment-ions.md)
139
140
### Spectral Library Management
141
142
Full-featured spectral library class with comprehensive functionality for loading, processing, filtering, and exporting spectral libraries. Supports multiple formats and advanced operations like decoy generation and isotope calculations.
143
144
```python { .api }
145
class SpecLibBase:
146
precursor_df: pd.DataFrame
147
fragment_mz_df: pd.DataFrame
148
fragment_intensity_df: pd.DataFrame
149
150
def copy(self) -> 'SpecLibBase': ...
151
def append(self, other: 'SpecLibBase') -> None: ...
152
def calc_precursor_mz(self) -> None: ...
153
def calc_fragment_mz_df(self, frag_types: List[str]) -> None: ...
154
def save_hdf(self, filepath: str) -> None: ...
155
def load_hdf(self, filepath: str) -> None: ...
156
```
157
158
[Spectral Libraries](./spectral-libraries.md)
159
160
### PSM Reading and Processing
161
162
Unified interface for reading Peptide-Spectrum Match (PSM) files from multiple proteomics search engines. Standardizes column mappings and data formats across different tools for seamless data integration.
163
164
```python { .api }
165
class PSMReaderBase:
166
def import_file(self, filepath: str) -> pd.DataFrame: ...
167
def get_modification_mapping(self) -> dict: ...
168
169
# Available readers
170
class MaxQuantReader(PSMReaderBase): ...
171
class DiannReader(PSMReaderBase): ...
172
class SpectronautReader(PSMReaderBase): ...
173
# ... and 7 more search engine readers
174
```
175
176
[PSM Readers](./psm-readers.md)
177
178
### High-Performance I/O
179
180
Advanced I/O utilities including HDF5 wrapper with attribute-style access and memory-mapped arrays for efficient handling of large proteomics datasets. Optimized for high-throughput workflows and memory efficiency.
181
182
```python { .api }
183
class HDF_File:
184
def __init__(self, filepath: str, mode: str = 'r'): ...
185
def __getitem__(self, key: str): ...
186
def __setitem__(self, key: str, value): ...
187
188
def array(shape: tuple, dtype=np.float64) -> np.ndarray: ...
189
def zeros(shape: tuple, dtype=np.float64) -> np.ndarray: ...
190
def clear() -> None: ...
191
```
192
193
[I/O Utilities](./io-utilities.md)
194
195
### Quantification Data Processing
196
197
Comprehensive quantification data processing capabilities for handling multi-format quantified peptide and protein data from various proteomics platforms. Provides unified interfaces for reading, reformatting, and processing quantification results from DIA-NN, Spectronaut, MaxQuant, and other proteomics tools.
198
199
```python { .api }
200
def import_data(data_path: str, data_type: str = None, config_dict: dict = None) -> pd.DataFrame: ...
201
class LongFormatReader: ...
202
class WideFormatReader: ...
203
class ConfigDictLoader: ...
204
```
205
206
[Quantification](./quantification.md)
207
208
### Advanced Peptide Operations
209
210
Comprehensive peptide processing capabilities including precursor calculations, mass calculations, ion mobility transformations, and advanced algorithmic operations. Provides high-performance functions for large-scale peptide analysis, isotope modeling, and multi-dimensional separations integration.
211
212
```python { .api }
213
def update_precursor_mz(precursor_df: pd.DataFrame, batch_size: int = 100000) -> None: ...
214
def calc_precursor_isotope_info(precursor_df: pd.DataFrame, max_isotope: int = 6) -> None: ...
215
def ccs_to_mobility_for_df(precursor_df: pd.DataFrame, vendor_type: str = 'bruker') -> None: ...
216
def hash_precursor_df(precursor_df: pd.DataFrame, seed: int = 42) -> None: ...
217
```
218
219
[Advanced Peptide Operations](./advanced-peptide-operations.md)
220
221
### Advanced Spectral Library Operations
222
223
Extended spectral library functionality including decoy generation, format conversion, library validation, and specialized library formats. Provides comprehensive tools for spectral library manipulation, quality control, and integration with various proteomics workflows and search engines.
224
225
```python { .api }
226
class SpecLibDecoy: ...
227
class DIANNDecoyGenerator: ...
228
class SpecLibFlat: ...
229
class LibraryReaderBase: ...
230
class Schema: ...
231
```
232
233
[Advanced Spectral Libraries](./advanced-spectral-libraries.md)
234
235
### SMILES and Chemical Representations
236
237
Comprehensive cheminformatics capabilities for peptide and amino acid SMILES (Simplified Molecular-Input Line-Entry System) representations. Provides tools for chemical structure encoding, modification representation, and integration with computational chemistry workflows in proteomics.
238
239
```python { .api }
240
class AminoAcidModifier: ...
241
class PeptideSmilesEncoder: ...
242
def calculate_molecular_descriptors(smiles: str) -> dict: ...
243
def predict_retention_time_from_smiles(smiles: str, model_type: str = 'krokhin') -> float: ...
244
```
245
246
[SMILES Chemistry](./smiles-chemistry.md)
247
248
### Protein Analysis
249
250
FASTA file processing, protein sequence analysis, and enzymatic digestion utilities. Supports protein inference workflows and integration with proteomics identification pipelines.
251
252
```python { .api }
253
def read_fasta_file(filepath: str) -> Iterator[tuple[str, str]]: ...
254
def get_uniprot_gene_name(description: str) -> str: ...
255
```
256
257
[Protein Analysis](./protein-analysis.md)