0
# Spectral Library Management
1
2
Full-featured spectral library management with comprehensive functionality for loading, processing, filtering, and exporting spectral libraries. Supports multiple formats, advanced operations like decoy generation, isotope calculations, and integration with various mass spectrometry workflows.
3
4
## Capabilities
5
6
### Core Spectral Library Class
7
8
The main SpecLibBase class provides comprehensive spectral library functionality with integrated DataFrame management and processing capabilities.
9
10
```python { .api }
11
class SpecLibBase:
12
"""
13
Main spectral library class with comprehensive functionality.
14
15
Properties:
16
- precursor_df: DataFrame with precursor information (sequence, mods, charge, proteins)
17
- peptide_df: DataFrame with unique peptide information
18
- fragment_mz_df: DataFrame with fragment m/z values
19
- fragment_intensity_df: DataFrame with fragment intensities
20
"""
21
22
# Core properties
23
precursor_df: pd.DataFrame
24
peptide_df: pd.DataFrame
25
fragment_mz_df: pd.DataFrame
26
fragment_intensity_df: pd.DataFrame
27
28
def __init__(self):
29
"""Initialize empty spectral library."""
30
31
def copy(self) -> 'SpecLibBase':
32
"""
33
Create deep copy of spectral library.
34
35
Returns:
36
New SpecLibBase instance with copied data
37
"""
38
39
def append(self, other: 'SpecLibBase') -> None:
40
"""
41
Append another spectral library to this one.
42
43
Parameters:
44
- other: Another SpecLibBase instance to append
45
"""
46
47
def refine_df(self) -> None:
48
"""
49
Sort and optimize all DataFrames for performance.
50
Sets proper indexing and memory layout.
51
"""
52
53
def append_decoy_sequence(self, decoy_sequence: str,
54
decoy_proteins: str = "decoy") -> None:
55
"""
56
Add decoy sequences to the library.
57
58
Parameters:
59
- decoy_sequence: Decoy sequence string
60
- decoy_proteins: Protein identifier for decoys
61
"""
62
```
63
64
### Mass and M/Z Calculations
65
66
Methods for calculating precursor and fragment m/z values with support for modifications and charge states.
67
68
```python { .api }
69
class SpecLibBase:
70
def calc_precursor_mz(self) -> None:
71
"""
72
Calculate precursor m/z values from mass and charge.
73
Updates precursor_df with 'mz' column.
74
"""
75
76
def calc_fragment_mz_df(self, frag_types: List[str] = None) -> None:
77
"""
78
Generate fragment m/z DataFrame for all precursors.
79
80
Parameters:
81
- frag_types: List of fragment types like ['b+', 'y+', 'b++', 'y++']
82
If None, uses default fragment types
83
"""
84
85
def update_precursor_mz(self) -> None:
86
"""
87
Update precursor m/z values after modifications.
88
Alias for calc_precursor_mz() for backwards compatibility.
89
"""
90
```
91
92
### Hashing and Identification
93
94
Methods for generating hash codes for fast precursor lookup and deduplication.
95
96
```python { .api }
97
class SpecLibBase:
98
def hash_precursor_df(self) -> None:
99
"""
100
Add hash columns to precursor DataFrame.
101
Adds 'seq_hash' and 'prec_hash' columns for fast lookup.
102
"""
103
104
def get_mod_seq_hash(self) -> pd.Series:
105
"""
106
Generate hash codes for modified peptide sequences.
107
108
Returns:
109
Series with hash codes for each sequence
110
"""
111
112
def get_mod_seq_charge_hash(self) -> pd.Series:
113
"""
114
Generate hash codes for precursors (sequence + charge).
115
116
Returns:
117
Series with hash codes for each precursor
118
"""
119
```
120
121
### Isotope Calculations
122
123
Methods for calculating isotope patterns and intensities for precursors.
124
125
```python { .api }
126
class SpecLibBase:
127
def calc_precursor_isotope_info(self, max_isotope: int = 6) -> None:
128
"""
129
Calculate isotope envelope information for precursors.
130
131
Parameters:
132
- max_isotope: Maximum number of isotope peaks to calculate
133
"""
134
135
def calc_precursor_isotope_info_mp(self, max_isotope: int = 6,
136
n_jobs: int = 8) -> None:
137
"""
138
Multiprocessing isotope information calculation.
139
140
Parameters:
141
- max_isotope: Maximum isotope peaks
142
- n_jobs: Number of parallel processes
143
"""
144
145
def calc_precursor_isotope_intensity(self, max_isotope: int = 6) -> None:
146
"""
147
Calculate isotope pattern intensities for precursors.
148
149
Parameters:
150
- max_isotope: Maximum isotope peaks to calculate
151
"""
152
153
def calc_precursor_isotope_intensity_mp(self, max_isotope: int = 6,
154
n_jobs: int = 8) -> None:
155
"""
156
Multiprocessing isotope intensity calculation.
157
158
Parameters:
159
- max_isotope: Maximum isotope peaks
160
- n_jobs: Number of parallel processes
161
"""
162
```
163
164
### Fragment Processing
165
166
Methods for processing and optimizing fragment data within the spectral library.
167
168
```python { .api }
169
class SpecLibBase:
170
def remove_unused_fragments(self) -> None:
171
"""
172
Remove fragment entries with zero intensity across all precursors.
173
Compresses fragment DataFrames to save memory.
174
"""
175
176
def calc_fragment_count(self) -> pd.Series:
177
"""
178
Count number of fragments per precursor.
179
180
Returns:
181
Series with fragment counts indexed by precursor
182
"""
183
184
def filter_fragment_number(self, top_k: int = 100) -> None:
185
"""
186
Keep only top-k fragments per precursor by intensity.
187
188
Parameters:
189
- top_k: Number of top fragments to retain per precursor
190
"""
191
192
def sort_fragment_by_intensity(self, ascending: bool = False) -> None:
193
"""
194
Sort fragments by intensity within each precursor.
195
196
Parameters:
197
- ascending: Sort order (False for highest intensity first)
198
"""
199
```
200
201
### I/O Operations
202
203
Comprehensive I/O methods supporting multiple spectral library formats.
204
205
```python { .api }
206
class SpecLibBase:
207
def save_hdf(self, filepath: str, **kwargs) -> None:
208
"""
209
Save spectral library to HDF5 format.
210
211
Parameters:
212
- filepath: Output HDF5 file path
213
- **kwargs: Additional HDF5 options
214
"""
215
216
def load_hdf(self, filepath: str, **kwargs) -> None:
217
"""
218
Load spectral library from HDF5 format.
219
220
Parameters:
221
- filepath: Input HDF5 file path
222
- **kwargs: Additional loading options
223
"""
224
225
# Note: Additional export formats may be available through external functions
226
# Check the alphabase.spectral_library module for format-specific export utilities
227
```
228
229
### Library Statistics and Analysis
230
231
Methods for analyzing spectral library content and quality metrics.
232
233
```python { .api }
234
class SpecLibBase:
235
# Note: Statistical analysis and validation methods may be available
236
# through external functions in the alphabase.spectral_library module
237
pass
238
```
239
240
### Utility Functions
241
242
Standalone functions for spectral library operations and annotations.
243
244
```python { .api }
245
def annotate_fragments_from_speclib(target_lib: SpecLibBase,
246
donor_lib: SpecLibBase,
247
match_tolerance: float = 0.02) -> None:
248
"""
249
Annotate fragments using donor spectral library.
250
251
Parameters:
252
- target_lib: Target library to annotate
253
- donor_lib: Donor library with reference spectra
254
- match_tolerance: Mass tolerance for matching (Da)
255
"""
256
257
def get_available_columns(spec_lib: SpecLibBase) -> dict:
258
"""
259
Get available DataFrame columns across all library components.
260
261
Parameters:
262
- spec_lib: Spectral library instance
263
264
Returns:
265
Dictionary with available columns for each DataFrame
266
"""
267
268
# Note: Additional utility functions for library merging and filtering
269
# may be available in the alphabase.spectral_library module
270
```
271
272
## Usage Examples
273
274
### Basic Library Creation and Processing
275
276
```python
277
from alphabase.spectral_library.base import SpecLibBase
278
import pandas as pd
279
280
# Create new spectral library
281
spec_lib = SpecLibBase()
282
283
# Add precursor data
284
precursor_df = pd.DataFrame({
285
'sequence': ['PEPTIDE', 'SEQUENCE', 'EXAMPLE'],
286
'mods': ['', 'Phospho (STY)@2', 'Oxidation (M)@1'],
287
'charge': [2, 3, 2],
288
'proteins': ['P12345', 'P67890', 'P11111'],
289
'rt': [25.5, 32.1, 28.7] # retention times
290
})
291
292
spec_lib.precursor_df = precursor_df
293
294
# Optimize DataFrame structure
295
spec_lib.refine_df()
296
297
# Calculate precursor m/z values
298
spec_lib.calc_precursor_mz()
299
300
# Generate fragment m/z values
301
frag_types = ['b+', 'y+', 'b++', 'y++']
302
spec_lib.calc_fragment_mz_df(frag_types)
303
304
print(f"Library contains {len(spec_lib.precursor_df)} precursors")
305
print(f"Generated {len(spec_lib.fragment_mz_df)} fragment entries")
306
```
307
308
### Library I/O Operations
309
310
```python
311
# Save library in HDF5 format
312
spec_lib.save_hdf('my_library.hdf5')
313
314
# Load library from HDF5
315
new_lib = SpecLibBase()
316
new_lib.load_hdf('my_library.hdf5')
317
318
# Additional export formats may be available through external functions
319
# Check alphabase.spectral_library module for format-specific exporters
320
```
321
322
### Advanced Processing
323
324
```python
325
# Add hash codes for fast lookup
326
spec_lib.hash_precursor_df()
327
328
# Calculate isotope patterns
329
spec_lib.calc_precursor_isotope_info(max_isotope=6)
330
331
# Remove low-intensity fragments
332
spec_lib.filter_fragment_number(top_k=50)
333
334
# Remove unused fragment entries
335
spec_lib.remove_unused_fragments()
336
337
# Library statistics can be calculated manually:
338
print(f"Precursors: {len(spec_lib.precursor_df)}")
339
print(f"Fragments: {len(spec_lib.fragment_mz_df)}")
340
```
341
342
### Library Merging and Filtering
343
344
```python
345
# Merge multiple libraries using append method
346
lib1 = SpecLibBase()
347
lib2 = SpecLibBase()
348
# ... populate libraries ...
349
350
# Merge libraries
351
merged_lib = lib1.copy()
352
merged_lib.append(lib2)
353
354
# Filter by specific proteins using pandas operations
355
target_proteins = ['P12345', 'P67890']
356
filtered_precursors = merged_lib.precursor_df[
357
merged_lib.precursor_df['proteins'].isin(target_proteins)
358
]
359
360
print(f"Merged library: {len(merged_lib.precursor_df)} precursors")
361
print(f"Filtered precursors: {len(filtered_precursors)} precursors")
362
```
363
364
### Library Validation and Quality Control
365
366
```python
367
# Manual validation and quality control
368
print(f"Library integrity check:")
369
print(f" Precursors: {len(spec_lib.precursor_df)}")
370
print(f" Fragment m/z entries: {len(spec_lib.fragment_mz_df)}")
371
print(f" Fragment intensity entries: {len(spec_lib.fragment_intensity_df)}")
372
373
# Get fragment count statistics
374
frag_counts = spec_lib.calc_fragment_count()
375
print(f"Average fragments per precursor: {frag_counts.mean():.1f}")
376
print(f"Min fragments: {frag_counts.min()}, Max fragments: {frag_counts.max()}")
377
378
# Check available columns
379
available_cols = get_available_columns(spec_lib)
380
print(f"Available columns: {available_cols}")
381
```
382
383
### Working with Decoys
384
385
```python
386
# Create a copy for decoy generation
387
decoy_lib = spec_lib.copy()
388
389
# Add decoy sequences (typically done with specialized decoy generation)
390
for idx, row in spec_lib.precursor_df.iterrows():
391
# Reverse sequence as simple decoy strategy
392
decoy_seq = row['sequence'][::-1]
393
decoy_lib.append_decoy_sequence(decoy_seq, decoy_proteins="DECOY_" + row['proteins'])
394
395
print(f"Original library: {len(spec_lib.precursor_df)} precursors")
396
print(f"With decoys: {len(decoy_lib.precursor_df)} precursors")
397
```