Tessl Tile for pypi/alphabase@1.6.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

tessl/pypi-alphabase

An infrastructure Python package of the AlphaX ecosystem for MS proteomics

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:pypi/alphabase@1.6.x

To install, run

npx @tessl/cli install tessl/pypi-alphabase@1.6.0

0
# AlphaBase
1

2
An infrastructure Python package for the AlphaX ecosystem that provides essential functionalities for mass spectrometry (MS) proteomics. AlphaBase serves as the foundational library for peptide and protein analysis, spectral library management, PSM (Peptide-Spectrum Match) reading, quantification workflows, and data processing utilities across multiple MS data formats.
3

4
## Package Information
5

6
- **Package Name**: alphabase
7
- **Language**: Python
8
- **Installation**: `pip install alphabase`
9
- **Version**: 1.6.2
10
- **License**: Apache-2.0
11

12
## Core Imports
13

14
```python
15
import alphabase
16
```
17

18
Common patterns for working with specific modules:
19

20
```python
21
# Chemical constants and calculations
22
from alphabase.constants.aa import AA_ASCII_MASS, calc_AA_masses
23
from alphabase.constants.atom import MASS_PROTON, calc_mass_from_formula
24
from alphabase.constants.modification import MOD_DF, add_new_modifications
25

26
# Fragment and precursor calculations
27
from alphabase.peptide.fragment import get_charged_frag_types, create_fragment_mz_dataframe
28
from alphabase.peptide.precursor import update_precursor_mz, calc_precursor_isotope_info
29
from alphabase.peptide.mobility import ccs_to_mobility_for_df, mobility_to_ccs_for_df
30

31
# Spectral library operations
32
from alphabase.spectral_library.base import SpecLibBase
33
from alphabase.spectral_library.decoy import SpecLibDecoy, DIANNDecoyGenerator
34
from alphabase.spectral_library.flat import SpecLibFlat
35

36
# PSM reading from various search engines
37
from alphabase.psm_reader import MaxQuantReader, DiannReader, SpectronautReader
38

39
# Quantification data processing
40
from alphabase.quantification.quant_reader.quant_reader_manager import import_data
41
from alphabase.quantification.quant_reader.longformat_reader import LongFormatReader
42

43
# SMILES and cheminformatics
44
from alphabase.smiles.peptide import PeptideSmilesEncoder
45
from alphabase.smiles.smiles import AminoAcidModifier
46

47
# High-performance I/O
48
from alphabase.io.hdf import HDF_File
49
from alphabase.io.tempmmap import array, zeros
50
```
51

52
## Basic Usage
53

54
```python
55
import pandas as pd
56
from alphabase.constants.aa import calc_AA_masses
57
from alphabase.constants.modification import calc_modification_mass
58
from alphabase.peptide.fragment import create_fragment_mz_dataframe
59
from alphabase.spectral_library.base import SpecLibBase
60

61
# Calculate amino acid masses for peptide sequences
62
sequences = ['PEPTIDE', 'SEQUENCE', 'EXAMPLE']
63
aa_masses = calc_AA_masses(sequences)
64

65
# Calculate modification masses
66
mod_sequences = ['PEPTIDE[Oxidation (M)]', 'SEQUENCE[Phospho (STY)]']
67
mod_masses = calc_modification_mass(mod_sequences)
68

69
# Create a spectral library
70
spec_lib = SpecLibBase()
71

72
# Load precursor data
73
precursor_df = pd.DataFrame({
74
    'sequence': ['PEPTIDE', 'SEQUENCE'],
75
    'mods': ['', 'Phospho (STY)@2'],
76
    'charge': [2, 3],
77
    'proteins': ['P12345', 'P67890']
78
})
79

80
spec_lib.precursor_df = precursor_df
81
spec_lib.refine_df()
82

83
# Calculate precursor m/z values
84
spec_lib.calc_precursor_mz()
85

86
# Generate fragment m/z dataframe
87
frag_types = ['b++', 'y++', 'b+', 'y+']
88
spec_lib.calc_fragment_mz_df(frag_types)
89

90
print(f"Created spectral library with {len(spec_lib.precursor_df)} precursors")
91
```
92

93
## Architecture
94

95
AlphaBase is organized into functional modules that provide both high-level object-oriented interfaces and low-level array operations:
96

97
- **Constants**: Chemical databases (amino acids, elements, modifications, isotopes) with fast lookup tables
98
- **Peptide Processing**: Mass calculations, fragment ion generation, precursor analysis, ion mobility, and advanced algorithmic operations
99
- **Spectral Libraries**: Full-featured spectral library management with filtering, processing, I/O, decoy generation, and format conversion
100
- **PSM Readers**: Unified interface for reading outputs from 10+ proteomics search engines
101
- **Quantification**: Multi-format quantification readers, data reformatters, and processing pipelines for various proteomics platforms
102
- **SMILES Chemistry**: Cheminformatics capabilities for chemical structure representation and property prediction
103
- **I/O Utilities**: Advanced HDF5 wrapper and memory-mapped arrays for high-performance data processing
104
- **Protein Analysis**: FASTA processing, protein digestion, inference workflows, and sequence analysis
105

106
This modular design enables both rapid prototyping and high-throughput production workflows in mass spectrometry proteomics, with comprehensive coverage from raw data processing to advanced computational analysis.
107

108
## Capabilities
109

110
### Chemical Constants and Calculations
111

112
Comprehensive databases of amino acids, chemical elements, modifications, and isotopes with vectorized mass calculations. Provides the foundation for all proteomics calculations with pre-computed lookup tables for performance.
113

114
```python { .api }
115
# Core constants
116
AA_ASCII_MASS: np.ndarray  # 128-length array of AA masses
117
MASS_PROTON: float = 1.00727646688
118
MOD_DF: pd.DataFrame  # Complete modification database
119

120
# Mass calculation functions
121
def calc_AA_masses(sequences: List[str]) -> np.ndarray: ...
122
def calc_mass_from_formula(formula: str) -> float: ...
123
def calc_modification_mass(mod_sequences: List[str]) -> np.ndarray: ...
124
```
125

126
[Chemical Constants](./chemical-constants.md)
127

128
### Fragment Ion Generation
129

130
Complete fragment ion series generation with support for multiple fragment types, neutral losses, and charge states. Enables creation of theoretical spectra for spectral library construction and peptide identification.
131

132
```python { .api }
133
def get_charged_frag_types(frag_types: List[str], charges: List[int]) -> List[str]: ...
134
def create_fragment_mz_dataframe(precursor_df: pd.DataFrame, frag_types: List[str]) -> pd.DataFrame: ...
135
def calc_b_y_and_peptide_masses_for_same_len_seqs(sequences: List[str]) -> tuple: ...
136
```
137

138
[Fragment Ions](./fragment-ions.md)
139

140
### Spectral Library Management
141

142
Full-featured spectral library class with comprehensive functionality for loading, processing, filtering, and exporting spectral libraries. Supports multiple formats and advanced operations like decoy generation and isotope calculations.
143

144
```python { .api }
145
class SpecLibBase:
146
    precursor_df: pd.DataFrame
147
    fragment_mz_df: pd.DataFrame
148
    fragment_intensity_df: pd.DataFrame
149
    
150
    def copy(self) -> 'SpecLibBase': ...
151
    def append(self, other: 'SpecLibBase') -> None: ...
152
    def calc_precursor_mz(self) -> None: ...
153
    def calc_fragment_mz_df(self, frag_types: List[str]) -> None: ...
154
    def save_hdf(self, filepath: str) -> None: ...
155
    def load_hdf(self, filepath: str) -> None: ...
156
```
157

158
[Spectral Libraries](./spectral-libraries.md)
159

160
### PSM Reading and Processing
161

162
Unified interface for reading Peptide-Spectrum Match (PSM) files from multiple proteomics search engines. Standardizes column mappings and data formats across different tools for seamless data integration.
163

164
```python { .api }
165
class PSMReaderBase:
166
    def import_file(self, filepath: str) -> pd.DataFrame: ...
167
    def get_modification_mapping(self) -> dict: ...
168

169
# Available readers
170
class MaxQuantReader(PSMReaderBase): ...
171
class DiannReader(PSMReaderBase): ...
172
class SpectronautReader(PSMReaderBase): ...
173
# ... and 7 more search engine readers
174
```
175

176
[PSM Readers](./psm-readers.md)
177

178
### High-Performance I/O
179

180
Advanced I/O utilities including HDF5 wrapper with attribute-style access and memory-mapped arrays for efficient handling of large proteomics datasets. Optimized for high-throughput workflows and memory efficiency.
181

182
```python { .api }
183
class HDF_File:
184
    def __init__(self, filepath: str, mode: str = 'r'): ...
185
    def __getitem__(self, key: str): ...
186
    def __setitem__(self, key: str, value): ...
187

188
def array(shape: tuple, dtype=np.float64) -> np.ndarray: ...
189
def zeros(shape: tuple, dtype=np.float64) -> np.ndarray: ...
190
def clear() -> None: ...
191
```
192

193
[I/O Utilities](./io-utilities.md)
194

195
### Quantification Data Processing
196

197
Comprehensive quantification data processing capabilities for handling multi-format quantified peptide and protein data from various proteomics platforms. Provides unified interfaces for reading, reformatting, and processing quantification results from DIA-NN, Spectronaut, MaxQuant, and other proteomics tools.
198

199
```python { .api }
200
def import_data(data_path: str, data_type: str = None, config_dict: dict = None) -> pd.DataFrame: ...
201
class LongFormatReader: ...
202
class WideFormatReader: ...
203
class ConfigDictLoader: ...
204
```
205

206
[Quantification](./quantification.md)
207

208
### Advanced Peptide Operations
209

210
Comprehensive peptide processing capabilities including precursor calculations, mass calculations, ion mobility transformations, and advanced algorithmic operations. Provides high-performance functions for large-scale peptide analysis, isotope modeling, and multi-dimensional separations integration.
211

212
```python { .api }
213
def update_precursor_mz(precursor_df: pd.DataFrame, batch_size: int = 100000) -> None: ...
214
def calc_precursor_isotope_info(precursor_df: pd.DataFrame, max_isotope: int = 6) -> None: ...
215
def ccs_to_mobility_for_df(precursor_df: pd.DataFrame, vendor_type: str = 'bruker') -> None: ...
216
def hash_precursor_df(precursor_df: pd.DataFrame, seed: int = 42) -> None: ...
217
```
218

219
[Advanced Peptide Operations](./advanced-peptide-operations.md)
220

221
### Advanced Spectral Library Operations
222

223
Extended spectral library functionality including decoy generation, format conversion, library validation, and specialized library formats. Provides comprehensive tools for spectral library manipulation, quality control, and integration with various proteomics workflows and search engines.
224

225
```python { .api }
226
class SpecLibDecoy: ...
227
class DIANNDecoyGenerator: ...
228
class SpecLibFlat: ...
229
class LibraryReaderBase: ...
230
class Schema: ...
231
```
232

233
[Advanced Spectral Libraries](./advanced-spectral-libraries.md)
234

235
### SMILES and Chemical Representations
236

237
Comprehensive cheminformatics capabilities for peptide and amino acid SMILES (Simplified Molecular-Input Line-Entry System) representations. Provides tools for chemical structure encoding, modification representation, and integration with computational chemistry workflows in proteomics.
238

239
```python { .api }
240
class AminoAcidModifier: ...
241
class PeptideSmilesEncoder: ...
242
def calculate_molecular_descriptors(smiles: str) -> dict: ...
243
def predict_retention_time_from_smiles(smiles: str, model_type: str = 'krokhin') -> float: ...
244
```
245

246
[SMILES Chemistry](./smiles-chemistry.md)
247

248
### Protein Analysis
249

250
FASTA file processing, protein sequence analysis, and enzymatic digestion utilities. Supports protein inference workflows and integration with proteomics identification pipelines.
251

252
```python { .api }
253
def read_fasta_file(filepath: str) -> Iterator[tuple[str, str]]: ...
254
def get_uniprot_gene_name(description: str) -> str: ...
255
```
256

257
[Protein Analysis](./protein-analysis.md)