or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

advanced-peptide-operations.mdadvanced-spectral-libraries.mdchemical-constants.mdfragment-ions.mdindex.mdio-utilities.mdprotein-analysis.mdpsm-readers.mdquantification.mdsmiles-chemistry.mdspectral-libraries.md

advanced-peptide-operations.mddocs/

0

# Advanced Peptide Operations

1

2

Comprehensive peptide processing capabilities including precursor calculations, mass calculations, ion mobility transformations, and advanced algorithmic operations. Provides high-performance functions for large-scale peptide analysis, isotope modeling, and multi-dimensional separations integration.

3

4

## Capabilities

5

6

### Precursor Processing and Calculations

7

8

Advanced functions for precursor-level calculations including m/z computation, hashing, and isotope pattern analysis.

9

10

```python { .api }

11

def update_precursor_mz(precursor_df: pd.DataFrame,

12

batch_size: int = 100000) -> None:

13

"""

14

Calculate and update precursor m/z values in DataFrame.

15

16

Parameters:

17

- precursor_df: DataFrame with sequence, mods, charge columns

18

- batch_size: Batch size for memory-efficient processing

19

20

Modifies precursor_df in-place by adding 'mz' column

21

"""

22

23

def calc_precursor_mz(precursor_df: pd.DataFrame,

24

batch_size: int = 100000) -> np.ndarray:

25

"""

26

Calculate precursor m/z values from sequence and modifications.

27

28

Parameters:

29

- precursor_df: DataFrame with peptide information

30

- batch_size: Processing batch size

31

32

Returns:

33

Array of precursor m/z values

34

"""

35

36

def refine_precursor_df(precursor_df: pd.DataFrame,

37

drop_frag_idx: bool = True,

38

ensure_data_validity: bool = True) -> pd.DataFrame:

39

"""

40

Optimize and validate precursor DataFrame structure.

41

42

Parameters:

43

- precursor_df: Input precursor DataFrame

44

- drop_frag_idx: Whether to drop fragment indexing columns

45

- ensure_data_validity: Perform data validation checks

46

47

Returns:

48

Refined and optimized precursor DataFrame

49

"""

50

51

def is_precursor_refined(precursor_df: pd.DataFrame) -> bool:

52

"""

53

Check if precursor DataFrame has been refined/optimized.

54

55

Parameters:

56

- precursor_df: DataFrame to check

57

58

Returns:

59

True if DataFrame is in refined state

60

"""

61

62

def is_precursor_sorted(precursor_df: pd.DataFrame) -> bool:

63

"""

64

Check if precursor DataFrame is properly sorted.

65

66

Parameters:

67

- precursor_df: DataFrame to check

68

69

Returns:

70

True if DataFrame is sorted by precursor index

71

"""

72

```

73

74

### Peptide Hashing and Identification

75

76

Functions for generating hash codes for fast peptide lookup, deduplication, and comparison operations.

77

78

```python { .api }

79

def get_mod_seq_hash(sequence: List[str],

80

mod_names: List[List[str]],

81

mod_sites: List[List[int]],

82

seed: int = 42) -> np.ndarray:

83

"""

84

Generate hash codes for modified peptide sequences.

85

86

Parameters:

87

- sequence: List of peptide sequences

88

- mod_names: List of modification names for each sequence

89

- mod_sites: List of modification sites for each sequence

90

- seed: Random seed for reproducible hashing

91

92

Returns:

93

Array of hash codes for each modified sequence

94

"""

95

96

def get_mod_seq_charge_hash(sequence: List[str],

97

mod_names: List[List[str]],

98

mod_sites: List[List[int]],

99

charge: List[int],

100

seed: int = 42) -> np.ndarray:

101

"""

102

Generate hash codes for precursors (sequence + charge).

103

104

Parameters:

105

- sequence: List of peptide sequences

106

- mod_names: List of modification names for each sequence

107

- mod_sites: List of modification sites for each sequence

108

- charge: List of precursor charges

109

- seed: Random seed for reproducible hashing

110

111

Returns:

112

Array of hash codes for each precursor

113

"""

114

115

def hash_mod_seq_df(precursor_df: pd.DataFrame,

116

seed: int = 42) -> pd.Series:

117

"""

118

Generate sequence hash codes for precursor DataFrame.

119

120

Parameters:

121

- precursor_df: DataFrame with sequence, mods, mod_sites

122

- seed: Random seed for hashing

123

124

Returns:

125

Series with hash codes indexed by DataFrame index

126

"""

127

128

def hash_mod_seq_charge_df(precursor_df: pd.DataFrame,

129

seed: int = 42) -> pd.Series:

130

"""

131

Generate precursor hash codes including charge state.

132

133

Parameters:

134

- precursor_df: DataFrame with sequence, mods, mod_sites, charge

135

- seed: Random seed for hashing

136

137

Returns:

138

Series with precursor hash codes

139

"""

140

141

def hash_precursor_df(precursor_df: pd.DataFrame,

142

seed: int = 42) -> None:

143

"""

144

Add hash columns to precursor DataFrame in-place.

145

146

Parameters:

147

- precursor_df: DataFrame to modify

148

- seed: Random seed for hashing

149

150

Adds 'seq_hash' and 'prec_hash' columns to DataFrame

151

"""

152

```

153

154

### Isotope Pattern Calculations

155

156

Advanced functions for calculating isotope patterns, intensities, and distributions for precursors.

157

158

```python { .api }

159

def calc_precursor_isotope_info(precursor_df: pd.DataFrame,

160

max_isotope: int = 6) -> None:

161

"""

162

Calculate isotope envelope information for precursors.

163

164

Parameters:

165

- precursor_df: DataFrame with peptide sequences and modifications

166

- max_isotope: Maximum number of isotope peaks to calculate

167

168

Adds isotope-related columns to precursor_df in-place

169

"""

170

171

def calc_precursor_isotope_info_mp(precursor_df: pd.DataFrame,

172

max_isotope: int = 6,

173

n_jobs: int = 8) -> None:

174

"""

175

Multiprocessing isotope information calculation.

176

177

Parameters:

178

- precursor_df: DataFrame with peptide information

179

- max_isotope: Maximum isotope peaks to calculate

180

- n_jobs: Number of parallel processes

181

182

Adds isotope information using parallel processing

183

"""

184

185

def calc_precursor_isotope_intensity(precursor_df: pd.DataFrame,

186

max_isotope: int = 6) -> None:

187

"""

188

Calculate detailed isotope pattern intensities.

189

190

Parameters:

191

- precursor_df: DataFrame with peptide information

192

- max_isotope: Maximum isotope peaks for intensity calculation

193

194

Adds isotope intensity columns to DataFrame

195

"""

196

197

def calc_precursor_isotope_intensity_mp(precursor_df: pd.DataFrame,

198

max_isotope: int = 6,

199

n_jobs: int = 8) -> None:

200

"""

201

Multiprocessing isotope intensity calculation.

202

203

Parameters:

204

- precursor_df: DataFrame with peptide information

205

- max_isotope: Maximum isotope peaks

206

- n_jobs: Number of parallel processes

207

208

Parallel calculation of isotope intensities

209

"""

210

211

def get_mod_seq_formula(sequence: List[str],

212

mod_names: List[List[str]],

213

mod_sites: List[List[int]]) -> List[str]:

214

"""

215

Generate chemical formulas for modified peptide sequences.

216

217

Parameters:

218

- sequence: List of peptide sequences

219

- mod_names: List of modification names for each sequence

220

- mod_sites: List of modification sites for each sequence

221

222

Returns:

223

List of chemical formula strings for each modified sequence

224

"""

225

```

226

227

### Advanced Mass Calculations

228

229

Efficient mass calculation functions optimized for batch processing and high-throughput analysis.

230

231

```python { .api }

232

def calc_b_y_and_peptide_masses_for_same_len_seqs(sequences: List[str],

233

mod_names: List[List[str]] = None,

234

mod_sites: List[List[int]] = None,

235

aa_mass_diffs: List[List[float]] = None,

236

aa_mass_diff_sites: List[List[int]] = None) -> tuple[np.ndarray, np.ndarray, np.ndarray]:

237

"""

238

Batch calculate b/y fragments and peptide masses for equal-length sequences.

239

240

Parameters:

241

- sequences: List of equal-length peptide sequences

242

- mod_names: Optional modification names for each sequence

243

- mod_sites: Optional modification sites for each sequence

244

- aa_mass_diffs: Optional amino acid mass differences

245

- aa_mass_diff_sites: Optional sites for mass differences

246

247

Returns:

248

Tuple of (b_ion_masses, y_ion_masses, peptide_masses)

249

All arrays have optimized memory layout for equal-length sequences

250

"""

251

252

def calc_peptide_masses_for_same_len_seqs(sequences: List[str],

253

mod_list: List[tuple] = None,

254

mod_diff_list: List[tuple] = None) -> np.ndarray:

255

"""

256

Calculate peptide masses for equal-length sequences efficiently.

257

258

Parameters:

259

- sequences: List of equal-length peptide sequences

260

- mod_list: List of (mod_names, mod_sites) tuples

261

- mod_diff_list: List of (mass_diffs, mass_diff_sites) tuples

262

263

Returns:

264

1D array of peptide masses with optimized computation

265

"""

266

267

def calc_diff_modification_mass(pep_len: int,

268

mass_diffs: List[float],

269

mass_diff_sites: List[int]) -> np.ndarray:

270

"""

271

Calculate mass differences for open search workflows.

272

273

Parameters:

274

- pep_len: Peptide sequence length

275

- mass_diffs: List of mass differences to apply

276

- mass_diff_sites: List of sites where mass differences occur

277

278

Returns:

279

2D array with mass differences by position

280

"""

281

282

def calc_mod_diff_masses_for_same_len_seqs(nAA: int,

283

aa_mass_diffs_list: List[List[float]],

284

mod_sites_list: List[List[int]]) -> np.ndarray:

285

"""

286

Batch calculation of modification mass differences.

287

288

Parameters:

289

- nAA: Number of amino acids (sequence length)

290

- aa_mass_diffs_list: List of mass difference arrays

291

- mod_sites_list: List of modification site arrays

292

293

Returns:

294

3D array with mass differences for batch processing

295

"""

296

```

297

298

### Ion Mobility and CCS Calculations

299

300

Functions for collision cross section (CCS) and ion mobility calculations across different instrument platforms.

301

302

```python { .api }

303

def get_reduced_mass(precursor_mzs: np.ndarray,

304

charges: np.ndarray) -> np.ndarray:

305

"""

306

Calculate reduced mass for ion mobility calculations.

307

308

Parameters:

309

- precursor_mzs: Array of precursor m/z values

310

- charges: Array of precursor charges

311

312

Returns:

313

Array of reduced masses

314

"""

315

316

def ccs_to_mobility_bruker(ccs: np.ndarray,

317

mz: np.ndarray,

318

charge: np.ndarray,

319

mass_gas: float = 28.014,

320

temp: float = 273.15,

321

t_diff: float = 0.0) -> np.ndarray:

322

"""

323

Convert collision cross section to ion mobility (Bruker platform).

324

325

Parameters:

326

- ccs: Array of CCS values (Ų)

327

- mz: Array of m/z values

328

- charge: Array of charge states

329

- mass_gas: Mass of drift gas (default: N2)

330

- temp: Temperature in Kelvin

331

- t_diff: Temperature difference correction

332

333

Returns:

334

Array of ion mobility values (1/K0)

335

"""

336

337

def mobility_to_ccs_bruker(mobility: np.ndarray,

338

mz: np.ndarray,

339

charge: np.ndarray,

340

mass_gas: float = 28.014,

341

temp: float = 273.15,

342

t_diff: float = 0.0) -> np.ndarray:

343

"""

344

Convert ion mobility to collision cross section (Bruker platform).

345

346

Parameters:

347

- mobility: Array of ion mobility values (1/K0)

348

- mz: Array of m/z values

349

- charge: Array of charge states

350

- mass_gas: Mass of drift gas (default: N2)

351

- temp: Temperature in Kelvin

352

- t_diff: Temperature difference correction

353

354

Returns:

355

Array of CCS values (Ų)

356

"""

357

358

def ccs_to_mobility_waters(ccs: np.ndarray,

359

mz: np.ndarray,

360

charge: np.ndarray,

361

**kwargs) -> np.ndarray:

362

"""

363

Convert CCS to ion mobility for Waters instruments.

364

365

Parameters:

366

- ccs: Array of CCS values

367

- mz: Array of m/z values

368

- charge: Array of charge states

369

- **kwargs: Platform-specific parameters

370

371

Returns:

372

Array of ion mobility values

373

"""

374

375

def mobility_to_ccs_waters(mobility: np.ndarray,

376

mz: np.ndarray,

377

charge: np.ndarray,

378

**kwargs) -> np.ndarray:

379

"""

380

Convert ion mobility to CCS for Waters instruments.

381

382

Parameters:

383

- mobility: Array of ion mobility values

384

- mz: Array of m/z values

385

- charge: Array of charge states

386

- **kwargs: Platform-specific parameters

387

388

Returns:

389

Array of CCS values

390

"""

391

392

def ccs_to_mobility_for_df(precursor_df: pd.DataFrame,

393

vendor_type: str = 'bruker') -> None:

394

"""

395

Convert CCS to mobility values for precursor DataFrame.

396

397

Parameters:

398

- precursor_df: DataFrame with ccs, mz, charge columns

399

- vendor_type: Instrument vendor ('bruker', 'waters', 'agilent')

400

401

Adds 'mobility' column to DataFrame in-place

402

"""

403

404

def mobility_to_ccs_for_df(precursor_df: pd.DataFrame,

405

vendor_type: str = 'bruker') -> None:

406

"""

407

Convert mobility to CCS values for precursor DataFrame.

408

409

Parameters:

410

- precursor_df: DataFrame with mobility, mz, charge columns

411

- vendor_type: Instrument vendor ('bruker', 'waters', 'agilent')

412

413

Adds 'ccs' column to DataFrame in-place

414

"""

415

```

416

417

### Batch Processing and Optimization

418

419

Functions optimized for high-throughput peptide processing with memory efficiency and parallel computation.

420

421

```python { .api }

422

def process_precursors_in_batches(precursor_df: pd.DataFrame,

423

processing_func: callable,

424

batch_size: int = 100000,

425

n_jobs: int = 1,

426

**kwargs) -> pd.DataFrame:

427

"""

428

Process large precursor DataFrames in memory-efficient batches.

429

430

Parameters:

431

- precursor_df: Large precursor DataFrame

432

- processing_func: Function to apply to each batch

433

- batch_size: Number of precursors per batch

434

- n_jobs: Number of parallel processes

435

- **kwargs: Additional arguments for processing function

436

437

Returns:

438

Processed DataFrame with results from all batches

439

"""

440

441

def optimize_precursor_memory_layout(precursor_df: pd.DataFrame) -> pd.DataFrame:

442

"""

443

Optimize DataFrame memory layout for computational efficiency.

444

445

Parameters:

446

- precursor_df: Input precursor DataFrame

447

448

Returns:

449

DataFrame with optimized memory layout and data types

450

"""

451

452

def validate_precursor_data_integrity(precursor_df: pd.DataFrame) -> dict:

453

"""

454

Validate precursor data for completeness and consistency.

455

456

Parameters:

457

- precursor_df: Precursor DataFrame to validate

458

459

Returns:

460

Dictionary with validation results and any issues found

461

"""

462

463

def create_precursor_index_mapping(precursor_df: pd.DataFrame) -> dict:

464

"""

465

Create efficient index mappings for fast precursor lookup.

466

467

Parameters:

468

- precursor_df: Precursor DataFrame

469

470

Returns:

471

Dictionary with various index mappings for optimized access

472

"""

473

```

474

475

### Statistical and Analysis Functions

476

477

Functions for statistical analysis and quality assessment of peptide-level data.

478

479

```python { .api }

480

def calculate_precursor_statistics(precursor_df: pd.DataFrame) -> pd.DataFrame:

481

"""

482

Calculate comprehensive statistics for precursor data.

483

484

Parameters:

485

- precursor_df: Precursor DataFrame

486

487

Returns:

488

DataFrame with statistical summaries

489

"""

490

491

def detect_precursor_outliers(precursor_df: pd.DataFrame,

492

method: str = 'zscore',

493

threshold: float = 3.0) -> pd.Series:

494

"""

495

Detect outlier precursors based on various metrics.

496

497

Parameters:

498

- precursor_df: Precursor DataFrame

499

- method: Outlier detection method ('zscore', 'iqr', 'isolation_forest')

500

- threshold: Threshold for outlier detection

501

502

Returns:

503

Boolean Series indicating outliers

504

"""

505

506

def analyze_modification_patterns(precursor_df: pd.DataFrame) -> dict:

507

"""

508

Analyze patterns in peptide modifications.

509

510

Parameters:

511

- precursor_df: DataFrame with modification information

512

513

Returns:

514

Dictionary with modification analysis results

515

"""

516

517

def assess_sequence_coverage(precursor_df: pd.DataFrame,

518

protein_sequences: dict) -> pd.DataFrame:

519

"""

520

Assess protein sequence coverage from identified precursors.

521

522

Parameters:

523

- precursor_df: DataFrame with precursor sequences and proteins

524

- protein_sequences: Dictionary mapping protein IDs to sequences

525

526

Returns:

527

DataFrame with coverage statistics per protein

528

"""

529

```

530

531

## Usage Examples

532

533

### Basic Precursor Processing

534

535

```python

536

from alphabase.peptide.precursor import (

537

update_precursor_mz, refine_precursor_df, hash_precursor_df

538

)

539

import pandas as pd

540

541

# Create precursor DataFrame

542

precursor_df = pd.DataFrame({

543

'sequence': ['PEPTIDE', 'SEQUENCE', 'EXAMPLE'],

544

'mods': ['', 'Phospho (STY)@2', 'Oxidation (M)@1'],

545

'charge': [2, 3, 2],

546

'proteins': ['P12345', 'P67890', 'P11111']

547

})

548

549

# Refine DataFrame structure

550

refined_df = refine_precursor_df(precursor_df, ensure_data_validity=True)

551

552

# Calculate m/z values

553

update_precursor_mz(refined_df)

554

print(f"Added m/z values: {refined_df['mz'].tolist()}")

555

556

# Add hash codes for fast lookup

557

hash_precursor_df(refined_df)

558

print(f"Added hash columns: {refined_df.columns.tolist()}")

559

```

560

561

### Isotope Pattern Calculations

562

563

```python

564

from alphabase.peptide.precursor import (

565

calc_precursor_isotope_info, calc_precursor_isotope_intensity

566

)

567

568

# Calculate isotope envelope information

569

calc_precursor_isotope_info(refined_df, max_isotope=6)

570

print(f"Isotope columns: {[col for col in refined_df.columns if 'isotope' in col]}")

571

572

# Calculate detailed isotope intensities

573

calc_precursor_isotope_intensity(refined_df, max_isotope=6)

574

print(f"Isotope intensity patterns calculated for {len(refined_df)} precursors")

575

576

# For large datasets, use multiprocessing

577

from alphabase.peptide.precursor import calc_precursor_isotope_info_mp

578

calc_precursor_isotope_info_mp(large_precursor_df, max_isotope=6, n_jobs=8)

579

```

580

581

### Advanced Mass Calculations

582

583

```python

584

from alphabase.peptide.mass_calc import (

585

calc_b_y_and_peptide_masses_for_same_len_seqs,

586

calc_peptide_masses_for_same_len_seqs

587

)

588

589

# Efficient batch processing for same-length sequences

590

same_len_sequences = ['PEPTIDE', 'EXAMPLE', 'TESTPEP'] # All length 7

591

mod_names = [[], ['Oxidation (M)'], []]

592

mod_sites = [[], [4], []]

593

594

# Calculate b/y fragments and peptide masses

595

b_masses, y_masses, peptide_masses = calc_b_y_and_peptide_masses_for_same_len_seqs(

596

sequences=same_len_sequences,

597

mod_names=mod_names,

598

mod_sites=mod_sites

599

)

600

601

print(f"B-ion masses shape: {b_masses.shape}")

602

print(f"Y-ion masses shape: {y_masses.shape}")

603

print(f"Peptide masses: {peptide_masses}")

604

605

# For peptide masses only

606

peptide_masses_only = calc_peptide_masses_for_same_len_seqs(

607

sequences=same_len_sequences,

608

mod_list=list(zip(mod_names, mod_sites))

609

)

610

print(f"Peptide masses: {peptide_masses_only}")

611

```

612

613

### Ion Mobility and CCS Calculations

614

615

```python

616

from alphabase.peptide.mobility import (

617

ccs_to_mobility_for_df, mobility_to_ccs_for_df,

618

ccs_to_mobility_bruker, mobility_to_ccs_bruker

619

)

620

621

# Add CCS values to DataFrame (example values)

622

mobility_df = refined_df.copy()

623

mobility_df['ccs'] = [150.5, 180.2, 165.8] # Example CCS values

624

625

# Convert CCS to mobility for Bruker platform

626

ccs_to_mobility_for_df(mobility_df, vendor_type='bruker')

627

print(f"Added mobility values: {mobility_df['mobility'].tolist()}")

628

629

# Convert back to CCS to verify

630

test_df = mobility_df[['mobility', 'mz', 'charge']].copy()

631

mobility_to_ccs_for_df(test_df, vendor_type='bruker')

632

print(f"Verified CCS values: {test_df['ccs'].tolist()}")

633

634

# Direct array calculations

635

import numpy as np

636

ccs_values = np.array([150.5, 180.2, 165.8])

637

mz_values = mobility_df['mz'].values

638

charge_values = mobility_df['charge'].values

639

640

mobility_values = ccs_to_mobility_bruker(ccs_values, mz_values, charge_values)

641

print(f"Direct mobility calculation: {mobility_values}")

642

```

643

644

### High-Throughput Batch Processing

645

646

```python

647

from alphabase.peptide.precursor import process_precursors_in_batches

648

import numpy as np

649

650

# Create large dataset for demonstration

651

np.random.seed(42)

652

large_df = pd.DataFrame({

653

'sequence': ['PEPTIDE'] * 100000 + ['EXAMPLE'] * 100000,

654

'charge': np.random.choice([2, 3, 4], 200000),

655

'proteins': [f'P{i:05d}' for i in range(200000)]

656

})

657

658

# Define processing function

659

def add_theoretical_rt(batch_df):

660

"""Add theoretical retention time based on sequence properties."""

661

batch_df = batch_df.copy()

662

# Simple hydrophobicity-based RT prediction (example)

663

hydrophobic_aas = ['A', 'I', 'L', 'F', 'W', 'Y', 'V']

664

batch_df['theoretical_rt'] = [

665

sum(1 for aa in seq if aa in hydrophobic_aas) * 2.5 + 10

666

for seq in batch_df['sequence']

667

]

668

return batch_df

669

670

# Process in batches

671

processed_df = process_precursors_in_batches(

672

large_df,

673

processing_func=add_theoretical_rt,

674

batch_size=50000,

675

n_jobs=4

676

)

677

678

print(f"Processed {len(processed_df)} precursors with theoretical RT")

679

print(f"RT range: {processed_df['theoretical_rt'].min():.1f} - {processed_df['theoretical_rt'].max():.1f}")

680

```

681

682

### Data Quality Assessment

683

684

```python

685

from alphabase.peptide.precursor import (

686

validate_precursor_data_integrity,

687

detect_precursor_outliers,

688

calculate_precursor_statistics

689

)

690

691

# Validate data integrity

692

validation_results = validate_precursor_data_integrity(processed_df)

693

print(f"Validation results:")

694

for check, result in validation_results.items():

695

print(f" {check}: {result}")

696

697

# Calculate comprehensive statistics

698

stats_df = calculate_precursor_statistics(processed_df)

699

print(f"Precursor statistics:")

700

print(stats_df.head())

701

702

# Detect outliers

703

outliers = detect_precursor_outliers(

704

processed_df,

705

method='zscore',

706

threshold=3.0

707

)

708

print(f"Detected {outliers.sum()} outlier precursors ({outliers.mean()*100:.1f}%)")

709

710

# Remove outliers

711

clean_df = processed_df[~outliers].copy()

712

print(f"Clean dataset: {len(clean_df)} precursors")

713

```

714

715

### Modification Pattern Analysis

716

717

```python

718

from alphabase.peptide.precursor import analyze_modification_patterns

719

720

# Add more complex modifications for analysis

721

complex_df = refined_df.copy()

722

complex_df['mods'] = [

723

'Oxidation (M)@3;Acetyl (Protein N-term)@0',

724

'Phospho (STY)@2;Phospho (STY)@5',

725

'Carbamidomethyl (C)@2;Oxidation (M)@6'

726

]

727

728

# Analyze modification patterns

729

mod_analysis = analyze_modification_patterns(complex_df)

730

print(f"Modification analysis:")

731

print(f" Most common modifications: {mod_analysis['common_mods']}")

732

print(f" Co-occurring modifications: {mod_analysis['cooccurrence']}")

733

print(f" Site preferences: {mod_analysis['site_preferences']}")

734

```

735

736

### Memory Optimization

737

738

```python

739

from alphabase.peptide.precursor import optimize_precursor_memory_layout

740

741

# Check memory usage before optimization

742

print(f"Memory usage before optimization: {processed_df.memory_usage(deep=True).sum() / 1e6:.1f} MB")

743

744

# Optimize memory layout

745

optimized_df = optimize_precursor_memory_layout(processed_df)

746

print(f"Memory usage after optimization: {optimized_df.memory_usage(deep=True).sum() / 1e6:.1f} MB")

747

748

# Compare data types

749

print("Data type changes:")

750

for col in processed_df.columns:

751

if col in optimized_df.columns:

752

old_dtype = processed_df[col].dtype

753

new_dtype = optimized_df[col].dtype

754

if old_dtype != new_dtype:

755

print(f" {col}: {old_dtype} -> {new_dtype}")

756

```