0
# Chemical Constants and Calculations
1
2
Comprehensive databases and calculation functions for amino acids, chemical elements, modifications, and isotopes. These components form the foundation of all mass spectrometry calculations in AlphaBase, providing pre-computed lookup tables and vectorized operations for high-performance proteomics workflows.
3
4
## Capabilities
5
6
### Amino Acid Constants and Calculations
7
8
Core amino acid database with masses, formulas, and properties, plus vectorized calculation functions for peptide sequences.
9
10
```python { .api }
11
# Global constants
12
AA_ASCII_MASS: np.ndarray # 128-length array indexed by ASCII code
13
AA_DF: pd.DataFrame # Complete amino acid properties dataframe
14
AA_Composition: dict # Amino acid formula compositions
15
aa_formula: pd.DataFrame # Amino acid formulas and properties
16
17
# Mass calculation functions
18
def calc_AA_masses(sequences: List[str]) -> np.ndarray:
19
"""
20
Calculate amino acid masses for peptide sequences.
21
22
Parameters:
23
- sequences: List of peptide sequences
24
25
Returns:
26
2D numpy array with masses for each AA position
27
"""
28
29
def calc_AA_masses_for_same_len_seqs(sequences: List[str]) -> np.ndarray:
30
"""
31
Fast batch calculation for equal-length sequences.
32
33
Parameters:
34
- sequences: List of equal-length peptide sequences
35
36
Returns:
37
2D numpy array with optimized memory layout
38
"""
39
40
def calc_sequence_masses_for_same_len_seqs(sequences: List[str]) -> np.ndarray:
41
"""
42
Calculate full sequence masses for equal-length sequences.
43
44
Parameters:
45
- sequences: List of equal-length peptide sequences
46
47
Returns:
48
1D numpy array with total masses
49
"""
50
51
# Database modification functions
52
def update_an_AA(aa_code: str, formula: dict, mass: float = None) -> None:
53
"""
54
Update a single amino acid definition.
55
56
Parameters:
57
- aa_code: Single letter amino acid code
58
- formula: Chemical formula as dict {'C': 6, 'H': 12, ...}
59
- mass: Optional mass override
60
"""
61
62
def reset_AA_mass() -> None:
63
"""Recalculate amino acid masses after modifications."""
64
65
def reset_AA_df() -> None:
66
"""Reset amino acid DataFrame from formulas."""
67
```
68
69
### Chemical Elements and Atoms
70
71
Fundamental chemical constants and formula parsing capabilities with isotope information.
72
73
```python { .api }
74
# Physical constants
75
MASS_PROTON: float = 1.00727646688
76
MASS_ISOTOPE: float = 1.00235
77
MAX_ISOTOPE_LEN: int = 8
78
79
# Element masses
80
MASS_H: float = 1.007825032
81
MASS_C: float = 12.0
82
MASS_O: float = 15.994914620
83
MASS_N: float = 14.003074004
84
MASS_H2O: float = 18.0105647
85
MASS_NH3: float = 17.026549101
86
87
# Chemical databases
88
CHEM_INFO_DICT: dict # Element information dictionary
89
CHEM_MONO_MASS: dict # Monoisotopic masses dictionary
90
CHEM_ISOTOPE_DIST: dict # Isotope distributions dictionary
91
CHEM_MONO_IDX: dict # Monoisotopic index mappings
92
EMPTY_DIST: np.ndarray # Default isotope distribution
93
94
# Formula parsing and mass calculation
95
def parse_formula(formula: str) -> dict:
96
"""
97
Parse chemical formula string into composition dictionary.
98
99
Parameters:
100
- formula: Chemical formula like 'C6H12N2O'
101
102
Returns:
103
Dictionary with element counts {'C': 6, 'H': 12, 'N': 2, 'O': 1}
104
"""
105
106
def calc_mass_from_formula(formula: str) -> float:
107
"""
108
Calculate monoisotopic mass from chemical formula.
109
110
Parameters:
111
- formula: Chemical formula string
112
113
Returns:
114
Monoisotopic mass as float
115
"""
116
117
class ChemicalCompositonFormula:
118
"""Handle chemical compositions and parse SMILES notation."""
119
120
def __init__(self, formula: str = None):
121
"""
122
Initialize with optional formula.
123
124
Parameters:
125
- formula: Chemical formula string or SMILES notation
126
"""
127
128
def calc_mass(self) -> float:
129
"""Calculate monoisotopic mass of composition."""
130
131
# Database management
132
def update_atom_infos(atom_dict: dict) -> None:
133
"""Update atomic information from external data."""
134
135
def reset_elements() -> None:
136
"""Reset element data from default sources."""
137
138
def load_elem_yaml(yaml_path: str) -> None:
139
"""Load element definitions from YAML file."""
140
```
141
142
### Modifications Database and Calculations
143
144
Complete modification database with masses, formulas, and loss patterns, plus calculation functions for modified peptide sequences.
145
146
```python { .api }
147
# Global modification constants
148
MOD_DF: pd.DataFrame # Main modification database
149
MOD_INFO_DICT: dict # Modification information
150
MOD_CHEM: dict # Modification chemistry
151
MOD_MASS: dict # Modification masses
152
MOD_LOSS_MASS: dict # Modification loss masses
153
MOD_Composition: dict # Modification compositions
154
MOD_LOSS_IMPORTANCE: dict # Loss importance rankings
155
156
# Modification mass calculations
157
def calc_modification_mass(mod_sequences: List[str]) -> np.ndarray:
158
"""
159
Calculate modification masses for peptide sequences.
160
161
Parameters:
162
- mod_sequences: List of modified sequences like 'PEPTIDE[Oxidation (M)]'
163
164
Returns:
165
2D numpy array with modification masses per position
166
"""
167
168
def calc_mod_masses_for_same_len_seqs(mod_sequences: List[str]) -> np.ndarray:
169
"""
170
Batch modification mass calculation for equal-length sequences.
171
172
Parameters:
173
- mod_sequences: List of equal-length modified sequences
174
175
Returns:
176
2D numpy array with optimized layout
177
"""
178
179
def calc_modification_mass_sum(mod_sequences: List[str]) -> np.ndarray:
180
"""
181
Sum modification masses across peptide sequences.
182
183
Parameters:
184
- mod_sequences: List of modified sequences
185
186
Returns:
187
1D numpy array with total modification masses
188
"""
189
190
def calc_modloss_mass(mod_sequences: List[str]) -> np.ndarray:
191
"""
192
Calculate modification loss masses.
193
194
Parameters:
195
- mod_sequences: List of modified sequences
196
197
Returns:
198
2D numpy array with loss masses
199
"""
200
201
def calc_modloss_mass_with_importance(mod_sequences: List[str],
202
importance_level: int = 1) -> np.ndarray:
203
"""
204
Calculate modification losses filtered by importance.
205
206
Parameters:
207
- mod_sequences: List of modified sequences
208
- importance_level: Minimum importance level (1-3)
209
210
Returns:
211
2D numpy array with filtered loss masses
212
"""
213
214
# Database management
215
def add_new_modifications(mod_df: pd.DataFrame) -> None:
216
"""
217
Add custom modifications to global database.
218
219
Parameters:
220
- mod_df: DataFrame with new modifications
221
"""
222
223
def has_custom_mods() -> bool:
224
"""Check for presence of user-defined modifications."""
225
226
def load_mod_df(tsv_path: str) -> pd.DataFrame:
227
"""Load modifications from TSV file."""
228
229
def update_all_by_MOD_DF() -> None:
230
"""Update all modification globals from main DataFrame."""
231
232
def keep_modloss_by_importance(importance_level: int = 1) -> None:
233
"""Filter modification losses by importance ranking."""
234
```
235
236
### Isotope Calculations
237
238
Fast isotope pattern calculation with pre-built lookup tables and mathematical convolution functions.
239
240
```python { .api }
241
class IsotopeDistribution:
242
"""Fast isotope distribution calculator with pre-built tables."""
243
244
def __init__(self, max_mass: int = 2000, max_isotope_len: int = 8):
245
"""
246
Initialize isotope calculator.
247
248
Parameters:
249
- max_mass: Maximum mass for pre-calculated tables
250
- max_isotope_len: Maximum isotope pattern length
251
"""
252
253
def calc_isotope_distribution(self, formula: str) -> np.ndarray:
254
"""
255
Calculate isotope distribution for chemical formula.
256
257
Parameters:
258
- formula: Chemical formula string
259
260
Returns:
261
Numpy array with isotope intensities
262
"""
263
264
# Direct calculation functions
265
def formula_dist(formula: str) -> np.ndarray:
266
"""
267
Generate isotope distribution for chemical formula.
268
269
Parameters:
270
- formula: Chemical formula string
271
272
Returns:
273
Numpy array with isotope pattern
274
"""
275
276
def one_element_dist(element: str, count: int) -> np.ndarray:
277
"""
278
Calculate single element isotope distribution.
279
280
Parameters:
281
- element: Element symbol ('C', 'H', etc.)
282
- count: Number of atoms
283
284
Returns:
285
Numpy array with isotope intensities
286
"""
287
288
def abundance_convolution(dist1: np.ndarray, dist2: np.ndarray) -> np.ndarray:
289
"""
290
Convolute two isotope distributions.
291
292
Parameters:
293
- dist1: First isotope distribution
294
- dist2: Second isotope distribution
295
296
Returns:
297
Convolved isotope distribution
298
"""
299
300
def truncate_isotope(distribution: np.ndarray, max_len: int = 8) -> np.ndarray:
301
"""
302
Truncate isotope distribution to specified length.
303
304
Parameters:
305
- distribution: Input isotope distribution
306
- max_len: Maximum length to keep
307
308
Returns:
309
Truncated distribution
310
"""
311
```
312
313
## Usage Examples
314
315
### Basic Mass Calculations
316
317
```python
318
from alphabase.constants.aa import calc_AA_masses
319
from alphabase.constants.modification import calc_modification_mass
320
321
# Calculate amino acid masses
322
sequences = ['PEPTIDE', 'SEQUENCE', 'EXAMPLE']
323
aa_masses = calc_AA_masses(sequences)
324
print(f"AA masses shape: {aa_masses.shape}") # (3, 8) for longest sequence
325
326
# Calculate modification masses
327
mod_sequences = ['PEPTIDE[Oxidation (M)]', 'SEQUENCE[Phospho (STY)]']
328
mod_masses = calc_modification_mass(mod_sequences)
329
print(f"Modification masses: {mod_masses}")
330
```
331
332
### Chemical Formula Processing
333
334
```python
335
from alphabase.constants.atom import parse_formula, calc_mass_from_formula
336
337
# Parse and calculate mass
338
formula = "C6H12N2O2"
339
composition = parse_formula(formula)
340
mass = calc_mass_from_formula(formula)
341
print(f"Formula {formula}: {composition}, Mass: {mass:.6f}")
342
```
343
344
### Custom Modifications
345
346
```python
347
import pandas as pd
348
from alphabase.constants.modification import add_new_modifications
349
350
# Add custom modification
351
custom_mods = pd.DataFrame({
352
'mod_name': ['Custom_Mod'],
353
'mass': [42.0106],
354
'composition': ['C2H2O'],
355
'aa': ['K'],
356
'position': ['any']
357
})
358
359
add_new_modifications(custom_mods)
360
```
361
362
### Isotope Pattern Calculation
363
364
```python
365
from alphabase.constants.isotope import IsotopeDistribution
366
367
# Calculate isotope pattern
368
iso_calc = IsotopeDistribution()
369
pattern = iso_calc.calc_isotope_distribution("C50H80N14O10")
370
print(f"Isotope pattern: {pattern}")
371
```