or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

advanced-peptide-operations.mdadvanced-spectral-libraries.mdchemical-constants.mdfragment-ions.mdindex.mdio-utilities.mdprotein-analysis.mdpsm-readers.mdquantification.mdsmiles-chemistry.mdspectral-libraries.md

spectral-libraries.mddocs/

0

# Spectral Library Management

1

2

Full-featured spectral library management with comprehensive functionality for loading, processing, filtering, and exporting spectral libraries. Supports multiple formats, advanced operations like decoy generation, isotope calculations, and integration with various mass spectrometry workflows.

3

4

## Capabilities

5

6

### Core Spectral Library Class

7

8

The main SpecLibBase class provides comprehensive spectral library functionality with integrated DataFrame management and processing capabilities.

9

10

```python { .api }

11

class SpecLibBase:

12

"""

13

Main spectral library class with comprehensive functionality.

14

15

Properties:

16

- precursor_df: DataFrame with precursor information (sequence, mods, charge, proteins)

17

- peptide_df: DataFrame with unique peptide information

18

- fragment_mz_df: DataFrame with fragment m/z values

19

- fragment_intensity_df: DataFrame with fragment intensities

20

"""

21

22

# Core properties

23

precursor_df: pd.DataFrame

24

peptide_df: pd.DataFrame

25

fragment_mz_df: pd.DataFrame

26

fragment_intensity_df: pd.DataFrame

27

28

def __init__(self):

29

"""Initialize empty spectral library."""

30

31

def copy(self) -> 'SpecLibBase':

32

"""

33

Create deep copy of spectral library.

34

35

Returns:

36

New SpecLibBase instance with copied data

37

"""

38

39

def append(self, other: 'SpecLibBase') -> None:

40

"""

41

Append another spectral library to this one.

42

43

Parameters:

44

- other: Another SpecLibBase instance to append

45

"""

46

47

def refine_df(self) -> None:

48

"""

49

Sort and optimize all DataFrames for performance.

50

Sets proper indexing and memory layout.

51

"""

52

53

def append_decoy_sequence(self, decoy_sequence: str,

54

decoy_proteins: str = "decoy") -> None:

55

"""

56

Add decoy sequences to the library.

57

58

Parameters:

59

- decoy_sequence: Decoy sequence string

60

- decoy_proteins: Protein identifier for decoys

61

"""

62

```

63

64

### Mass and M/Z Calculations

65

66

Methods for calculating precursor and fragment m/z values with support for modifications and charge states.

67

68

```python { .api }

69

class SpecLibBase:

70

def calc_precursor_mz(self) -> None:

71

"""

72

Calculate precursor m/z values from mass and charge.

73

Updates precursor_df with 'mz' column.

74

"""

75

76

def calc_fragment_mz_df(self, frag_types: List[str] = None) -> None:

77

"""

78

Generate fragment m/z DataFrame for all precursors.

79

80

Parameters:

81

- frag_types: List of fragment types like ['b+', 'y+', 'b++', 'y++']

82

If None, uses default fragment types

83

"""

84

85

def update_precursor_mz(self) -> None:

86

"""

87

Update precursor m/z values after modifications.

88

Alias for calc_precursor_mz() for backwards compatibility.

89

"""

90

```

91

92

### Hashing and Identification

93

94

Methods for generating hash codes for fast precursor lookup and deduplication.

95

96

```python { .api }

97

class SpecLibBase:

98

def hash_precursor_df(self) -> None:

99

"""

100

Add hash columns to precursor DataFrame.

101

Adds 'seq_hash' and 'prec_hash' columns for fast lookup.

102

"""

103

104

def get_mod_seq_hash(self) -> pd.Series:

105

"""

106

Generate hash codes for modified peptide sequences.

107

108

Returns:

109

Series with hash codes for each sequence

110

"""

111

112

def get_mod_seq_charge_hash(self) -> pd.Series:

113

"""

114

Generate hash codes for precursors (sequence + charge).

115

116

Returns:

117

Series with hash codes for each precursor

118

"""

119

```

120

121

### Isotope Calculations

122

123

Methods for calculating isotope patterns and intensities for precursors.

124

125

```python { .api }

126

class SpecLibBase:

127

def calc_precursor_isotope_info(self, max_isotope: int = 6) -> None:

128

"""

129

Calculate isotope envelope information for precursors.

130

131

Parameters:

132

- max_isotope: Maximum number of isotope peaks to calculate

133

"""

134

135

def calc_precursor_isotope_info_mp(self, max_isotope: int = 6,

136

n_jobs: int = 8) -> None:

137

"""

138

Multiprocessing isotope information calculation.

139

140

Parameters:

141

- max_isotope: Maximum isotope peaks

142

- n_jobs: Number of parallel processes

143

"""

144

145

def calc_precursor_isotope_intensity(self, max_isotope: int = 6) -> None:

146

"""

147

Calculate isotope pattern intensities for precursors.

148

149

Parameters:

150

- max_isotope: Maximum isotope peaks to calculate

151

"""

152

153

def calc_precursor_isotope_intensity_mp(self, max_isotope: int = 6,

154

n_jobs: int = 8) -> None:

155

"""

156

Multiprocessing isotope intensity calculation.

157

158

Parameters:

159

- max_isotope: Maximum isotope peaks

160

- n_jobs: Number of parallel processes

161

"""

162

```

163

164

### Fragment Processing

165

166

Methods for processing and optimizing fragment data within the spectral library.

167

168

```python { .api }

169

class SpecLibBase:

170

def remove_unused_fragments(self) -> None:

171

"""

172

Remove fragment entries with zero intensity across all precursors.

173

Compresses fragment DataFrames to save memory.

174

"""

175

176

def calc_fragment_count(self) -> pd.Series:

177

"""

178

Count number of fragments per precursor.

179

180

Returns:

181

Series with fragment counts indexed by precursor

182

"""

183

184

def filter_fragment_number(self, top_k: int = 100) -> None:

185

"""

186

Keep only top-k fragments per precursor by intensity.

187

188

Parameters:

189

- top_k: Number of top fragments to retain per precursor

190

"""

191

192

def sort_fragment_by_intensity(self, ascending: bool = False) -> None:

193

"""

194

Sort fragments by intensity within each precursor.

195

196

Parameters:

197

- ascending: Sort order (False for highest intensity first)

198

"""

199

```

200

201

### I/O Operations

202

203

Comprehensive I/O methods supporting multiple spectral library formats.

204

205

```python { .api }

206

class SpecLibBase:

207

def save_hdf(self, filepath: str, **kwargs) -> None:

208

"""

209

Save spectral library to HDF5 format.

210

211

Parameters:

212

- filepath: Output HDF5 file path

213

- **kwargs: Additional HDF5 options

214

"""

215

216

def load_hdf(self, filepath: str, **kwargs) -> None:

217

"""

218

Load spectral library from HDF5 format.

219

220

Parameters:

221

- filepath: Input HDF5 file path

222

- **kwargs: Additional loading options

223

"""

224

225

# Note: Additional export formats may be available through external functions

226

# Check the alphabase.spectral_library module for format-specific export utilities

227

```

228

229

### Library Statistics and Analysis

230

231

Methods for analyzing spectral library content and quality metrics.

232

233

```python { .api }

234

class SpecLibBase:

235

# Note: Statistical analysis and validation methods may be available

236

# through external functions in the alphabase.spectral_library module

237

pass

238

```

239

240

### Utility Functions

241

242

Standalone functions for spectral library operations and annotations.

243

244

```python { .api }

245

def annotate_fragments_from_speclib(target_lib: SpecLibBase,

246

donor_lib: SpecLibBase,

247

match_tolerance: float = 0.02) -> None:

248

"""

249

Annotate fragments using donor spectral library.

250

251

Parameters:

252

- target_lib: Target library to annotate

253

- donor_lib: Donor library with reference spectra

254

- match_tolerance: Mass tolerance for matching (Da)

255

"""

256

257

def get_available_columns(spec_lib: SpecLibBase) -> dict:

258

"""

259

Get available DataFrame columns across all library components.

260

261

Parameters:

262

- spec_lib: Spectral library instance

263

264

Returns:

265

Dictionary with available columns for each DataFrame

266

"""

267

268

# Note: Additional utility functions for library merging and filtering

269

# may be available in the alphabase.spectral_library module

270

```

271

272

## Usage Examples

273

274

### Basic Library Creation and Processing

275

276

```python

277

from alphabase.spectral_library.base import SpecLibBase

278

import pandas as pd

279

280

# Create new spectral library

281

spec_lib = SpecLibBase()

282

283

# Add precursor data

284

precursor_df = pd.DataFrame({

285

'sequence': ['PEPTIDE', 'SEQUENCE', 'EXAMPLE'],

286

'mods': ['', 'Phospho (STY)@2', 'Oxidation (M)@1'],

287

'charge': [2, 3, 2],

288

'proteins': ['P12345', 'P67890', 'P11111'],

289

'rt': [25.5, 32.1, 28.7] # retention times

290

})

291

292

spec_lib.precursor_df = precursor_df

293

294

# Optimize DataFrame structure

295

spec_lib.refine_df()

296

297

# Calculate precursor m/z values

298

spec_lib.calc_precursor_mz()

299

300

# Generate fragment m/z values

301

frag_types = ['b+', 'y+', 'b++', 'y++']

302

spec_lib.calc_fragment_mz_df(frag_types)

303

304

print(f"Library contains {len(spec_lib.precursor_df)} precursors")

305

print(f"Generated {len(spec_lib.fragment_mz_df)} fragment entries")

306

```

307

308

### Library I/O Operations

309

310

```python

311

# Save library in HDF5 format

312

spec_lib.save_hdf('my_library.hdf5')

313

314

# Load library from HDF5

315

new_lib = SpecLibBase()

316

new_lib.load_hdf('my_library.hdf5')

317

318

# Additional export formats may be available through external functions

319

# Check alphabase.spectral_library module for format-specific exporters

320

```

321

322

### Advanced Processing

323

324

```python

325

# Add hash codes for fast lookup

326

spec_lib.hash_precursor_df()

327

328

# Calculate isotope patterns

329

spec_lib.calc_precursor_isotope_info(max_isotope=6)

330

331

# Remove low-intensity fragments

332

spec_lib.filter_fragment_number(top_k=50)

333

334

# Remove unused fragment entries

335

spec_lib.remove_unused_fragments()

336

337

# Library statistics can be calculated manually:

338

print(f"Precursors: {len(spec_lib.precursor_df)}")

339

print(f"Fragments: {len(spec_lib.fragment_mz_df)}")

340

```

341

342

### Library Merging and Filtering

343

344

```python

345

# Merge multiple libraries using append method

346

lib1 = SpecLibBase()

347

lib2 = SpecLibBase()

348

# ... populate libraries ...

349

350

# Merge libraries

351

merged_lib = lib1.copy()

352

merged_lib.append(lib2)

353

354

# Filter by specific proteins using pandas operations

355

target_proteins = ['P12345', 'P67890']

356

filtered_precursors = merged_lib.precursor_df[

357

merged_lib.precursor_df['proteins'].isin(target_proteins)

358

]

359

360

print(f"Merged library: {len(merged_lib.precursor_df)} precursors")

361

print(f"Filtered precursors: {len(filtered_precursors)} precursors")

362

```

363

364

### Library Validation and Quality Control

365

366

```python

367

# Manual validation and quality control

368

print(f"Library integrity check:")

369

print(f" Precursors: {len(spec_lib.precursor_df)}")

370

print(f" Fragment m/z entries: {len(spec_lib.fragment_mz_df)}")

371

print(f" Fragment intensity entries: {len(spec_lib.fragment_intensity_df)}")

372

373

# Get fragment count statistics

374

frag_counts = spec_lib.calc_fragment_count()

375

print(f"Average fragments per precursor: {frag_counts.mean():.1f}")

376

print(f"Min fragments: {frag_counts.min()}, Max fragments: {frag_counts.max()}")

377

378

# Check available columns

379

available_cols = get_available_columns(spec_lib)

380

print(f"Available columns: {available_cols}")

381

```

382

383

### Working with Decoys

384

385

```python

386

# Create a copy for decoy generation

387

decoy_lib = spec_lib.copy()

388

389

# Add decoy sequences (typically done with specialized decoy generation)

390

for idx, row in spec_lib.precursor_df.iterrows():

391

# Reverse sequence as simple decoy strategy

392

decoy_seq = row['sequence'][::-1]

393

decoy_lib.append_decoy_sequence(decoy_seq, decoy_proteins="DECOY_" + row['proteins'])

394

395

print(f"Original library: {len(spec_lib.precursor_df)} precursors")

396

print(f"With decoys: {len(decoy_lib.precursor_df)} precursors")

397

```