Tessl Tile for pypi/alphabase@1.6.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

advanced-peptide-operations.md advanced-spectral-libraries.md chemical-constants.md fragment-ions.md index.md io-utilities.md protein-analysis.md psm-readers.md quantification.md smiles-chemistry.md spectral-libraries.md

spectral-libraries.mddocs/

0
# Spectral Library Management
1

2
Full-featured spectral library management with comprehensive functionality for loading, processing, filtering, and exporting spectral libraries. Supports multiple formats, advanced operations like decoy generation, isotope calculations, and integration with various mass spectrometry workflows.
3

4
## Capabilities
5

6
### Core Spectral Library Class
7

8
The main SpecLibBase class provides comprehensive spectral library functionality with integrated DataFrame management and processing capabilities.
9

10
```python { .api }
11
class SpecLibBase:
12
    """
13
    Main spectral library class with comprehensive functionality.
14
    
15
    Properties:
16
    - precursor_df: DataFrame with precursor information (sequence, mods, charge, proteins)
17
    - peptide_df: DataFrame with unique peptide information
18
    - fragment_mz_df: DataFrame with fragment m/z values
19
    - fragment_intensity_df: DataFrame with fragment intensities
20
    """
21
    
22
    # Core properties
23
    precursor_df: pd.DataFrame
24
    peptide_df: pd.DataFrame  
25
    fragment_mz_df: pd.DataFrame
26
    fragment_intensity_df: pd.DataFrame
27
    
28
    def __init__(self):
29
        """Initialize empty spectral library."""
30
    
31
    def copy(self) -> 'SpecLibBase':
32
        """
33
        Create deep copy of spectral library.
34
        
35
        Returns:
36
        New SpecLibBase instance with copied data
37
        """
38
    
39
    def append(self, other: 'SpecLibBase') -> None:
40
        """
41
        Append another spectral library to this one.
42
        
43
        Parameters:
44
        - other: Another SpecLibBase instance to append
45
        """
46
    
47
    def refine_df(self) -> None:
48
        """
49
        Sort and optimize all DataFrames for performance.
50
        Sets proper indexing and memory layout.
51
        """
52
    
53
    def append_decoy_sequence(self, decoy_sequence: str, 
54
                            decoy_proteins: str = "decoy") -> None:
55
        """
56
        Add decoy sequences to the library.
57
        
58
        Parameters:
59
        - decoy_sequence: Decoy sequence string
60
        - decoy_proteins: Protein identifier for decoys
61
        """
62
```
63

64
### Mass and M/Z Calculations
65

66
Methods for calculating precursor and fragment m/z values with support for modifications and charge states.
67

68
```python { .api }
69
class SpecLibBase:
70
    def calc_precursor_mz(self) -> None:
71
        """
72
        Calculate precursor m/z values from mass and charge.
73
        Updates precursor_df with 'mz' column.
74
        """
75
    
76
    def calc_fragment_mz_df(self, frag_types: List[str] = None) -> None:
77
        """
78
        Generate fragment m/z DataFrame for all precursors.
79
        
80
        Parameters:
81
        - frag_types: List of fragment types like ['b+', 'y+', 'b++', 'y++']
82
                     If None, uses default fragment types
83
        """
84
    
85
    def update_precursor_mz(self) -> None:
86
        """
87
        Update precursor m/z values after modifications.
88
        Alias for calc_precursor_mz() for backwards compatibility.
89
        """
90
```
91

92
### Hashing and Identification
93

94
Methods for generating hash codes for fast precursor lookup and deduplication.
95

96
```python { .api }
97
class SpecLibBase:
98
    def hash_precursor_df(self) -> None:
99
        """
100
        Add hash columns to precursor DataFrame.
101
        Adds 'seq_hash' and 'prec_hash' columns for fast lookup.
102
        """
103
    
104
    def get_mod_seq_hash(self) -> pd.Series:
105
        """
106
        Generate hash codes for modified peptide sequences.
107
        
108
        Returns:
109
        Series with hash codes for each sequence
110
        """
111
    
112
    def get_mod_seq_charge_hash(self) -> pd.Series:
113
        """
114
        Generate hash codes for precursors (sequence + charge).
115
        
116
        Returns:
117
        Series with hash codes for each precursor
118
        """
119
```
120

121
### Isotope Calculations
122

123
Methods for calculating isotope patterns and intensities for precursors.
124

125
```python { .api }
126
class SpecLibBase:
127
    def calc_precursor_isotope_info(self, max_isotope: int = 6) -> None:
128
        """
129
        Calculate isotope envelope information for precursors.
130
        
131
        Parameters:
132
        - max_isotope: Maximum number of isotope peaks to calculate
133
        """
134
    
135
    def calc_precursor_isotope_info_mp(self, max_isotope: int = 6, 
136
                                      n_jobs: int = 8) -> None:
137
        """
138
        Multiprocessing isotope information calculation.
139
        
140
        Parameters:
141
        - max_isotope: Maximum isotope peaks
142
        - n_jobs: Number of parallel processes
143
        """
144
    
145
    def calc_precursor_isotope_intensity(self, max_isotope: int = 6) -> None:
146
        """
147
        Calculate isotope pattern intensities for precursors.
148
        
149
        Parameters:
150
        - max_isotope: Maximum isotope peaks to calculate
151
        """
152
    
153
    def calc_precursor_isotope_intensity_mp(self, max_isotope: int = 6,
154
                                           n_jobs: int = 8) -> None:
155
        """
156
        Multiprocessing isotope intensity calculation.
157
        
158
        Parameters:
159
        - max_isotope: Maximum isotope peaks
160
        - n_jobs: Number of parallel processes
161
        """
162
```
163

164
### Fragment Processing
165

166
Methods for processing and optimizing fragment data within the spectral library.
167

168
```python { .api }
169
class SpecLibBase:
170
    def remove_unused_fragments(self) -> None:
171
        """
172
        Remove fragment entries with zero intensity across all precursors.
173
        Compresses fragment DataFrames to save memory.
174
        """
175
    
176
    def calc_fragment_count(self) -> pd.Series:
177
        """
178
        Count number of fragments per precursor.
179
        
180
        Returns:
181
        Series with fragment counts indexed by precursor
182
        """
183
    
184
    def filter_fragment_number(self, top_k: int = 100) -> None:
185
        """
186
        Keep only top-k fragments per precursor by intensity.
187
        
188
        Parameters:
189
        - top_k: Number of top fragments to retain per precursor
190
        """
191
    
192
    def sort_fragment_by_intensity(self, ascending: bool = False) -> None:
193
        """
194
        Sort fragments by intensity within each precursor.
195
        
196
        Parameters:
197
        - ascending: Sort order (False for highest intensity first)
198
        """
199
```
200

201
### I/O Operations
202

203
Comprehensive I/O methods supporting multiple spectral library formats.
204

205
```python { .api }
206
class SpecLibBase:
207
    def save_hdf(self, filepath: str, **kwargs) -> None:
208
        """
209
        Save spectral library to HDF5 format.
210
        
211
        Parameters:
212
        - filepath: Output HDF5 file path
213
        - **kwargs: Additional HDF5 options
214
        """
215
    
216
    def load_hdf(self, filepath: str, **kwargs) -> None:
217
        """
218
        Load spectral library from HDF5 format.
219
        
220
        Parameters:
221
        - filepath: Input HDF5 file path
222
        - **kwargs: Additional loading options
223
        """
224
    
225
    # Note: Additional export formats may be available through external functions
226
    # Check the alphabase.spectral_library module for format-specific export utilities
227
```
228

229
### Library Statistics and Analysis
230

231
Methods for analyzing spectral library content and quality metrics.
232

233
```python { .api }
234
class SpecLibBase:
235
    # Note: Statistical analysis and validation methods may be available
236
    # through external functions in the alphabase.spectral_library module
237
    pass
238
```
239

240
### Utility Functions
241

242
Standalone functions for spectral library operations and annotations.
243

244
```python { .api }
245
def annotate_fragments_from_speclib(target_lib: SpecLibBase,
246
                                   donor_lib: SpecLibBase,
247
                                   match_tolerance: float = 0.02) -> None:
248
    """
249
    Annotate fragments using donor spectral library.
250
    
251
    Parameters:
252
    - target_lib: Target library to annotate
253
    - donor_lib: Donor library with reference spectra
254
    - match_tolerance: Mass tolerance for matching (Da)
255
    """
256

257
def get_available_columns(spec_lib: SpecLibBase) -> dict:
258
    """
259
    Get available DataFrame columns across all library components.
260
    
261
    Parameters:
262
    - spec_lib: Spectral library instance
263
    
264
    Returns:
265
    Dictionary with available columns for each DataFrame
266
    """
267

268
# Note: Additional utility functions for library merging and filtering
269
# may be available in the alphabase.spectral_library module
270
```
271

272
## Usage Examples
273

274
### Basic Library Creation and Processing
275

276
```python
277
from alphabase.spectral_library.base import SpecLibBase
278
import pandas as pd
279

280
# Create new spectral library
281
spec_lib = SpecLibBase()
282

283
# Add precursor data
284
precursor_df = pd.DataFrame({
285
    'sequence': ['PEPTIDE', 'SEQUENCE', 'EXAMPLE'],
286
    'mods': ['', 'Phospho (STY)@2', 'Oxidation (M)@1'],
287
    'charge': [2, 3, 2],
288
    'proteins': ['P12345', 'P67890', 'P11111'],
289
    'rt': [25.5, 32.1, 28.7]  # retention times
290
})
291

292
spec_lib.precursor_df = precursor_df
293

294
# Optimize DataFrame structure
295
spec_lib.refine_df()
296

297
# Calculate precursor m/z values
298
spec_lib.calc_precursor_mz()
299

300
# Generate fragment m/z values
301
frag_types = ['b+', 'y+', 'b++', 'y++']
302
spec_lib.calc_fragment_mz_df(frag_types)
303

304
print(f"Library contains {len(spec_lib.precursor_df)} precursors")
305
print(f"Generated {len(spec_lib.fragment_mz_df)} fragment entries")
306
```
307

308
### Library I/O Operations
309

310
```python
311
# Save library in HDF5 format
312
spec_lib.save_hdf('my_library.hdf5')
313

314
# Load library from HDF5
315
new_lib = SpecLibBase()
316
new_lib.load_hdf('my_library.hdf5')
317

318
# Additional export formats may be available through external functions
319
# Check alphabase.spectral_library module for format-specific exporters
320
```
321

322
### Advanced Processing
323

324
```python
325
# Add hash codes for fast lookup
326
spec_lib.hash_precursor_df()
327

328
# Calculate isotope patterns
329
spec_lib.calc_precursor_isotope_info(max_isotope=6)
330

331
# Remove low-intensity fragments
332
spec_lib.filter_fragment_number(top_k=50)
333

334
# Remove unused fragment entries
335
spec_lib.remove_unused_fragments()
336

337
# Library statistics can be calculated manually:
338
print(f"Precursors: {len(spec_lib.precursor_df)}")
339
print(f"Fragments: {len(spec_lib.fragment_mz_df)}")
340
```
341

342
### Library Merging and Filtering
343

344
```python
345
# Merge multiple libraries using append method
346
lib1 = SpecLibBase()
347
lib2 = SpecLibBase()
348
# ... populate libraries ...
349

350
# Merge libraries
351
merged_lib = lib1.copy()
352
merged_lib.append(lib2)
353

354
# Filter by specific proteins using pandas operations
355
target_proteins = ['P12345', 'P67890']
356
filtered_precursors = merged_lib.precursor_df[
357
    merged_lib.precursor_df['proteins'].isin(target_proteins)
358
]
359

360
print(f"Merged library: {len(merged_lib.precursor_df)} precursors")
361
print(f"Filtered precursors: {len(filtered_precursors)} precursors")
362
```
363

364
### Library Validation and Quality Control
365

366
```python
367
# Manual validation and quality control
368
print(f"Library integrity check:")
369
print(f"  Precursors: {len(spec_lib.precursor_df)}")
370
print(f"  Fragment m/z entries: {len(spec_lib.fragment_mz_df)}")
371
print(f"  Fragment intensity entries: {len(spec_lib.fragment_intensity_df)}")
372

373
# Get fragment count statistics
374
frag_counts = spec_lib.calc_fragment_count()
375
print(f"Average fragments per precursor: {frag_counts.mean():.1f}")
376
print(f"Min fragments: {frag_counts.min()}, Max fragments: {frag_counts.max()}")
377

378
# Check available columns
379
available_cols = get_available_columns(spec_lib)
380
print(f"Available columns: {available_cols}")
381
```
382

383
### Working with Decoys
384

385
```python
386
# Create a copy for decoy generation
387
decoy_lib = spec_lib.copy()
388

389
# Add decoy sequences (typically done with specialized decoy generation)
390
for idx, row in spec_lib.precursor_df.iterrows():
391
    # Reverse sequence as simple decoy strategy
392
    decoy_seq = row['sequence'][::-1]
393
    decoy_lib.append_decoy_sequence(decoy_seq, decoy_proteins="DECOY_" + row['proteins'])
394

395
print(f"Original library: {len(spec_lib.precursor_df)} precursors")
396
print(f"With decoys: {len(decoy_lib.precursor_df)} precursors")
397
```

Version

Tile

Files

spectral-libraries.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

spectral-libraries.mddocs/