0
# Advanced Peptide Operations
1
2
Comprehensive peptide processing capabilities including precursor calculations, mass calculations, ion mobility transformations, and advanced algorithmic operations. Provides high-performance functions for large-scale peptide analysis, isotope modeling, and multi-dimensional separations integration.
3
4
## Capabilities
5
6
### Precursor Processing and Calculations
7
8
Advanced functions for precursor-level calculations including m/z computation, hashing, and isotope pattern analysis.
9
10
```python { .api }
11
def update_precursor_mz(precursor_df: pd.DataFrame,
12
batch_size: int = 100000) -> None:
13
"""
14
Calculate and update precursor m/z values in DataFrame.
15
16
Parameters:
17
- precursor_df: DataFrame with sequence, mods, charge columns
18
- batch_size: Batch size for memory-efficient processing
19
20
Modifies precursor_df in-place by adding 'mz' column
21
"""
22
23
def calc_precursor_mz(precursor_df: pd.DataFrame,
24
batch_size: int = 100000) -> np.ndarray:
25
"""
26
Calculate precursor m/z values from sequence and modifications.
27
28
Parameters:
29
- precursor_df: DataFrame with peptide information
30
- batch_size: Processing batch size
31
32
Returns:
33
Array of precursor m/z values
34
"""
35
36
def refine_precursor_df(precursor_df: pd.DataFrame,
37
drop_frag_idx: bool = True,
38
ensure_data_validity: bool = True) -> pd.DataFrame:
39
"""
40
Optimize and validate precursor DataFrame structure.
41
42
Parameters:
43
- precursor_df: Input precursor DataFrame
44
- drop_frag_idx: Whether to drop fragment indexing columns
45
- ensure_data_validity: Perform data validation checks
46
47
Returns:
48
Refined and optimized precursor DataFrame
49
"""
50
51
def is_precursor_refined(precursor_df: pd.DataFrame) -> bool:
52
"""
53
Check if precursor DataFrame has been refined/optimized.
54
55
Parameters:
56
- precursor_df: DataFrame to check
57
58
Returns:
59
True if DataFrame is in refined state
60
"""
61
62
def is_precursor_sorted(precursor_df: pd.DataFrame) -> bool:
63
"""
64
Check if precursor DataFrame is properly sorted.
65
66
Parameters:
67
- precursor_df: DataFrame to check
68
69
Returns:
70
True if DataFrame is sorted by precursor index
71
"""
72
```
73
74
### Peptide Hashing and Identification
75
76
Functions for generating hash codes for fast peptide lookup, deduplication, and comparison operations.
77
78
```python { .api }
79
def get_mod_seq_hash(sequence: List[str],
80
mod_names: List[List[str]],
81
mod_sites: List[List[int]],
82
seed: int = 42) -> np.ndarray:
83
"""
84
Generate hash codes for modified peptide sequences.
85
86
Parameters:
87
- sequence: List of peptide sequences
88
- mod_names: List of modification names for each sequence
89
- mod_sites: List of modification sites for each sequence
90
- seed: Random seed for reproducible hashing
91
92
Returns:
93
Array of hash codes for each modified sequence
94
"""
95
96
def get_mod_seq_charge_hash(sequence: List[str],
97
mod_names: List[List[str]],
98
mod_sites: List[List[int]],
99
charge: List[int],
100
seed: int = 42) -> np.ndarray:
101
"""
102
Generate hash codes for precursors (sequence + charge).
103
104
Parameters:
105
- sequence: List of peptide sequences
106
- mod_names: List of modification names for each sequence
107
- mod_sites: List of modification sites for each sequence
108
- charge: List of precursor charges
109
- seed: Random seed for reproducible hashing
110
111
Returns:
112
Array of hash codes for each precursor
113
"""
114
115
def hash_mod_seq_df(precursor_df: pd.DataFrame,
116
seed: int = 42) -> pd.Series:
117
"""
118
Generate sequence hash codes for precursor DataFrame.
119
120
Parameters:
121
- precursor_df: DataFrame with sequence, mods, mod_sites
122
- seed: Random seed for hashing
123
124
Returns:
125
Series with hash codes indexed by DataFrame index
126
"""
127
128
def hash_mod_seq_charge_df(precursor_df: pd.DataFrame,
129
seed: int = 42) -> pd.Series:
130
"""
131
Generate precursor hash codes including charge state.
132
133
Parameters:
134
- precursor_df: DataFrame with sequence, mods, mod_sites, charge
135
- seed: Random seed for hashing
136
137
Returns:
138
Series with precursor hash codes
139
"""
140
141
def hash_precursor_df(precursor_df: pd.DataFrame,
142
seed: int = 42) -> None:
143
"""
144
Add hash columns to precursor DataFrame in-place.
145
146
Parameters:
147
- precursor_df: DataFrame to modify
148
- seed: Random seed for hashing
149
150
Adds 'seq_hash' and 'prec_hash' columns to DataFrame
151
"""
152
```
153
154
### Isotope Pattern Calculations
155
156
Advanced functions for calculating isotope patterns, intensities, and distributions for precursors.
157
158
```python { .api }
159
def calc_precursor_isotope_info(precursor_df: pd.DataFrame,
160
max_isotope: int = 6) -> None:
161
"""
162
Calculate isotope envelope information for precursors.
163
164
Parameters:
165
- precursor_df: DataFrame with peptide sequences and modifications
166
- max_isotope: Maximum number of isotope peaks to calculate
167
168
Adds isotope-related columns to precursor_df in-place
169
"""
170
171
def calc_precursor_isotope_info_mp(precursor_df: pd.DataFrame,
172
max_isotope: int = 6,
173
n_jobs: int = 8) -> None:
174
"""
175
Multiprocessing isotope information calculation.
176
177
Parameters:
178
- precursor_df: DataFrame with peptide information
179
- max_isotope: Maximum isotope peaks to calculate
180
- n_jobs: Number of parallel processes
181
182
Adds isotope information using parallel processing
183
"""
184
185
def calc_precursor_isotope_intensity(precursor_df: pd.DataFrame,
186
max_isotope: int = 6) -> None:
187
"""
188
Calculate detailed isotope pattern intensities.
189
190
Parameters:
191
- precursor_df: DataFrame with peptide information
192
- max_isotope: Maximum isotope peaks for intensity calculation
193
194
Adds isotope intensity columns to DataFrame
195
"""
196
197
def calc_precursor_isotope_intensity_mp(precursor_df: pd.DataFrame,
198
max_isotope: int = 6,
199
n_jobs: int = 8) -> None:
200
"""
201
Multiprocessing isotope intensity calculation.
202
203
Parameters:
204
- precursor_df: DataFrame with peptide information
205
- max_isotope: Maximum isotope peaks
206
- n_jobs: Number of parallel processes
207
208
Parallel calculation of isotope intensities
209
"""
210
211
def get_mod_seq_formula(sequence: List[str],
212
mod_names: List[List[str]],
213
mod_sites: List[List[int]]) -> List[str]:
214
"""
215
Generate chemical formulas for modified peptide sequences.
216
217
Parameters:
218
- sequence: List of peptide sequences
219
- mod_names: List of modification names for each sequence
220
- mod_sites: List of modification sites for each sequence
221
222
Returns:
223
List of chemical formula strings for each modified sequence
224
"""
225
```
226
227
### Advanced Mass Calculations
228
229
Efficient mass calculation functions optimized for batch processing and high-throughput analysis.
230
231
```python { .api }
232
def calc_b_y_and_peptide_masses_for_same_len_seqs(sequences: List[str],
233
mod_names: List[List[str]] = None,
234
mod_sites: List[List[int]] = None,
235
aa_mass_diffs: List[List[float]] = None,
236
aa_mass_diff_sites: List[List[int]] = None) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
237
"""
238
Batch calculate b/y fragments and peptide masses for equal-length sequences.
239
240
Parameters:
241
- sequences: List of equal-length peptide sequences
242
- mod_names: Optional modification names for each sequence
243
- mod_sites: Optional modification sites for each sequence
244
- aa_mass_diffs: Optional amino acid mass differences
245
- aa_mass_diff_sites: Optional sites for mass differences
246
247
Returns:
248
Tuple of (b_ion_masses, y_ion_masses, peptide_masses)
249
All arrays have optimized memory layout for equal-length sequences
250
"""
251
252
def calc_peptide_masses_for_same_len_seqs(sequences: List[str],
253
mod_list: List[tuple] = None,
254
mod_diff_list: List[tuple] = None) -> np.ndarray:
255
"""
256
Calculate peptide masses for equal-length sequences efficiently.
257
258
Parameters:
259
- sequences: List of equal-length peptide sequences
260
- mod_list: List of (mod_names, mod_sites) tuples
261
- mod_diff_list: List of (mass_diffs, mass_diff_sites) tuples
262
263
Returns:
264
1D array of peptide masses with optimized computation
265
"""
266
267
def calc_diff_modification_mass(pep_len: int,
268
mass_diffs: List[float],
269
mass_diff_sites: List[int]) -> np.ndarray:
270
"""
271
Calculate mass differences for open search workflows.
272
273
Parameters:
274
- pep_len: Peptide sequence length
275
- mass_diffs: List of mass differences to apply
276
- mass_diff_sites: List of sites where mass differences occur
277
278
Returns:
279
2D array with mass differences by position
280
"""
281
282
def calc_mod_diff_masses_for_same_len_seqs(nAA: int,
283
aa_mass_diffs_list: List[List[float]],
284
mod_sites_list: List[List[int]]) -> np.ndarray:
285
"""
286
Batch calculation of modification mass differences.
287
288
Parameters:
289
- nAA: Number of amino acids (sequence length)
290
- aa_mass_diffs_list: List of mass difference arrays
291
- mod_sites_list: List of modification site arrays
292
293
Returns:
294
3D array with mass differences for batch processing
295
"""
296
```
297
298
### Ion Mobility and CCS Calculations
299
300
Functions for collision cross section (CCS) and ion mobility calculations across different instrument platforms.
301
302
```python { .api }
303
def get_reduced_mass(precursor_mzs: np.ndarray,
304
charges: np.ndarray) -> np.ndarray:
305
"""
306
Calculate reduced mass for ion mobility calculations.
307
308
Parameters:
309
- precursor_mzs: Array of precursor m/z values
310
- charges: Array of precursor charges
311
312
Returns:
313
Array of reduced masses
314
"""
315
316
def ccs_to_mobility_bruker(ccs: np.ndarray,
317
mz: np.ndarray,
318
charge: np.ndarray,
319
mass_gas: float = 28.014,
320
temp: float = 273.15,
321
t_diff: float = 0.0) -> np.ndarray:
322
"""
323
Convert collision cross section to ion mobility (Bruker platform).
324
325
Parameters:
326
- ccs: Array of CCS values (Ų)
327
- mz: Array of m/z values
328
- charge: Array of charge states
329
- mass_gas: Mass of drift gas (default: N2)
330
- temp: Temperature in Kelvin
331
- t_diff: Temperature difference correction
332
333
Returns:
334
Array of ion mobility values (1/K0)
335
"""
336
337
def mobility_to_ccs_bruker(mobility: np.ndarray,
338
mz: np.ndarray,
339
charge: np.ndarray,
340
mass_gas: float = 28.014,
341
temp: float = 273.15,
342
t_diff: float = 0.0) -> np.ndarray:
343
"""
344
Convert ion mobility to collision cross section (Bruker platform).
345
346
Parameters:
347
- mobility: Array of ion mobility values (1/K0)
348
- mz: Array of m/z values
349
- charge: Array of charge states
350
- mass_gas: Mass of drift gas (default: N2)
351
- temp: Temperature in Kelvin
352
- t_diff: Temperature difference correction
353
354
Returns:
355
Array of CCS values (Ų)
356
"""
357
358
def ccs_to_mobility_waters(ccs: np.ndarray,
359
mz: np.ndarray,
360
charge: np.ndarray,
361
**kwargs) -> np.ndarray:
362
"""
363
Convert CCS to ion mobility for Waters instruments.
364
365
Parameters:
366
- ccs: Array of CCS values
367
- mz: Array of m/z values
368
- charge: Array of charge states
369
- **kwargs: Platform-specific parameters
370
371
Returns:
372
Array of ion mobility values
373
"""
374
375
def mobility_to_ccs_waters(mobility: np.ndarray,
376
mz: np.ndarray,
377
charge: np.ndarray,
378
**kwargs) -> np.ndarray:
379
"""
380
Convert ion mobility to CCS for Waters instruments.
381
382
Parameters:
383
- mobility: Array of ion mobility values
384
- mz: Array of m/z values
385
- charge: Array of charge states
386
- **kwargs: Platform-specific parameters
387
388
Returns:
389
Array of CCS values
390
"""
391
392
def ccs_to_mobility_for_df(precursor_df: pd.DataFrame,
393
vendor_type: str = 'bruker') -> None:
394
"""
395
Convert CCS to mobility values for precursor DataFrame.
396
397
Parameters:
398
- precursor_df: DataFrame with ccs, mz, charge columns
399
- vendor_type: Instrument vendor ('bruker', 'waters', 'agilent')
400
401
Adds 'mobility' column to DataFrame in-place
402
"""
403
404
def mobility_to_ccs_for_df(precursor_df: pd.DataFrame,
405
vendor_type: str = 'bruker') -> None:
406
"""
407
Convert mobility to CCS values for precursor DataFrame.
408
409
Parameters:
410
- precursor_df: DataFrame with mobility, mz, charge columns
411
- vendor_type: Instrument vendor ('bruker', 'waters', 'agilent')
412
413
Adds 'ccs' column to DataFrame in-place
414
"""
415
```
416
417
### Batch Processing and Optimization
418
419
Functions optimized for high-throughput peptide processing with memory efficiency and parallel computation.
420
421
```python { .api }
422
def process_precursors_in_batches(precursor_df: pd.DataFrame,
423
processing_func: callable,
424
batch_size: int = 100000,
425
n_jobs: int = 1,
426
**kwargs) -> pd.DataFrame:
427
"""
428
Process large precursor DataFrames in memory-efficient batches.
429
430
Parameters:
431
- precursor_df: Large precursor DataFrame
432
- processing_func: Function to apply to each batch
433
- batch_size: Number of precursors per batch
434
- n_jobs: Number of parallel processes
435
- **kwargs: Additional arguments for processing function
436
437
Returns:
438
Processed DataFrame with results from all batches
439
"""
440
441
def optimize_precursor_memory_layout(precursor_df: pd.DataFrame) -> pd.DataFrame:
442
"""
443
Optimize DataFrame memory layout for computational efficiency.
444
445
Parameters:
446
- precursor_df: Input precursor DataFrame
447
448
Returns:
449
DataFrame with optimized memory layout and data types
450
"""
451
452
def validate_precursor_data_integrity(precursor_df: pd.DataFrame) -> dict:
453
"""
454
Validate precursor data for completeness and consistency.
455
456
Parameters:
457
- precursor_df: Precursor DataFrame to validate
458
459
Returns:
460
Dictionary with validation results and any issues found
461
"""
462
463
def create_precursor_index_mapping(precursor_df: pd.DataFrame) -> dict:
464
"""
465
Create efficient index mappings for fast precursor lookup.
466
467
Parameters:
468
- precursor_df: Precursor DataFrame
469
470
Returns:
471
Dictionary with various index mappings for optimized access
472
"""
473
```
474
475
### Statistical and Analysis Functions
476
477
Functions for statistical analysis and quality assessment of peptide-level data.
478
479
```python { .api }
480
def calculate_precursor_statistics(precursor_df: pd.DataFrame) -> pd.DataFrame:
481
"""
482
Calculate comprehensive statistics for precursor data.
483
484
Parameters:
485
- precursor_df: Precursor DataFrame
486
487
Returns:
488
DataFrame with statistical summaries
489
"""
490
491
def detect_precursor_outliers(precursor_df: pd.DataFrame,
492
method: str = 'zscore',
493
threshold: float = 3.0) -> pd.Series:
494
"""
495
Detect outlier precursors based on various metrics.
496
497
Parameters:
498
- precursor_df: Precursor DataFrame
499
- method: Outlier detection method ('zscore', 'iqr', 'isolation_forest')
500
- threshold: Threshold for outlier detection
501
502
Returns:
503
Boolean Series indicating outliers
504
"""
505
506
def analyze_modification_patterns(precursor_df: pd.DataFrame) -> dict:
507
"""
508
Analyze patterns in peptide modifications.
509
510
Parameters:
511
- precursor_df: DataFrame with modification information
512
513
Returns:
514
Dictionary with modification analysis results
515
"""
516
517
def assess_sequence_coverage(precursor_df: pd.DataFrame,
518
protein_sequences: dict) -> pd.DataFrame:
519
"""
520
Assess protein sequence coverage from identified precursors.
521
522
Parameters:
523
- precursor_df: DataFrame with precursor sequences and proteins
524
- protein_sequences: Dictionary mapping protein IDs to sequences
525
526
Returns:
527
DataFrame with coverage statistics per protein
528
"""
529
```
530
531
## Usage Examples
532
533
### Basic Precursor Processing
534
535
```python
536
from alphabase.peptide.precursor import (
537
update_precursor_mz, refine_precursor_df, hash_precursor_df
538
)
539
import pandas as pd
540
541
# Create precursor DataFrame
542
precursor_df = pd.DataFrame({
543
'sequence': ['PEPTIDE', 'SEQUENCE', 'EXAMPLE'],
544
'mods': ['', 'Phospho (STY)@2', 'Oxidation (M)@1'],
545
'charge': [2, 3, 2],
546
'proteins': ['P12345', 'P67890', 'P11111']
547
})
548
549
# Refine DataFrame structure
550
refined_df = refine_precursor_df(precursor_df, ensure_data_validity=True)
551
552
# Calculate m/z values
553
update_precursor_mz(refined_df)
554
print(f"Added m/z values: {refined_df['mz'].tolist()}")
555
556
# Add hash codes for fast lookup
557
hash_precursor_df(refined_df)
558
print(f"Added hash columns: {refined_df.columns.tolist()}")
559
```
560
561
### Isotope Pattern Calculations
562
563
```python
564
from alphabase.peptide.precursor import (
565
calc_precursor_isotope_info, calc_precursor_isotope_intensity
566
)
567
568
# Calculate isotope envelope information
569
calc_precursor_isotope_info(refined_df, max_isotope=6)
570
print(f"Isotope columns: {[col for col in refined_df.columns if 'isotope' in col]}")
571
572
# Calculate detailed isotope intensities
573
calc_precursor_isotope_intensity(refined_df, max_isotope=6)
574
print(f"Isotope intensity patterns calculated for {len(refined_df)} precursors")
575
576
# For large datasets, use multiprocessing
577
from alphabase.peptide.precursor import calc_precursor_isotope_info_mp
578
calc_precursor_isotope_info_mp(large_precursor_df, max_isotope=6, n_jobs=8)
579
```
580
581
### Advanced Mass Calculations
582
583
```python
584
from alphabase.peptide.mass_calc import (
585
calc_b_y_and_peptide_masses_for_same_len_seqs,
586
calc_peptide_masses_for_same_len_seqs
587
)
588
589
# Efficient batch processing for same-length sequences
590
same_len_sequences = ['PEPTIDE', 'EXAMPLE', 'TESTPEP'] # All length 7
591
mod_names = [[], ['Oxidation (M)'], []]
592
mod_sites = [[], [4], []]
593
594
# Calculate b/y fragments and peptide masses
595
b_masses, y_masses, peptide_masses = calc_b_y_and_peptide_masses_for_same_len_seqs(
596
sequences=same_len_sequences,
597
mod_names=mod_names,
598
mod_sites=mod_sites
599
)
600
601
print(f"B-ion masses shape: {b_masses.shape}")
602
print(f"Y-ion masses shape: {y_masses.shape}")
603
print(f"Peptide masses: {peptide_masses}")
604
605
# For peptide masses only
606
peptide_masses_only = calc_peptide_masses_for_same_len_seqs(
607
sequences=same_len_sequences,
608
mod_list=list(zip(mod_names, mod_sites))
609
)
610
print(f"Peptide masses: {peptide_masses_only}")
611
```
612
613
### Ion Mobility and CCS Calculations
614
615
```python
616
from alphabase.peptide.mobility import (
617
ccs_to_mobility_for_df, mobility_to_ccs_for_df,
618
ccs_to_mobility_bruker, mobility_to_ccs_bruker
619
)
620
621
# Add CCS values to DataFrame (example values)
622
mobility_df = refined_df.copy()
623
mobility_df['ccs'] = [150.5, 180.2, 165.8] # Example CCS values
624
625
# Convert CCS to mobility for Bruker platform
626
ccs_to_mobility_for_df(mobility_df, vendor_type='bruker')
627
print(f"Added mobility values: {mobility_df['mobility'].tolist()}")
628
629
# Convert back to CCS to verify
630
test_df = mobility_df[['mobility', 'mz', 'charge']].copy()
631
mobility_to_ccs_for_df(test_df, vendor_type='bruker')
632
print(f"Verified CCS values: {test_df['ccs'].tolist()}")
633
634
# Direct array calculations
635
import numpy as np
636
ccs_values = np.array([150.5, 180.2, 165.8])
637
mz_values = mobility_df['mz'].values
638
charge_values = mobility_df['charge'].values
639
640
mobility_values = ccs_to_mobility_bruker(ccs_values, mz_values, charge_values)
641
print(f"Direct mobility calculation: {mobility_values}")
642
```
643
644
### High-Throughput Batch Processing
645
646
```python
647
from alphabase.peptide.precursor import process_precursors_in_batches
648
import numpy as np
649
650
# Create large dataset for demonstration
651
np.random.seed(42)
652
large_df = pd.DataFrame({
653
'sequence': ['PEPTIDE'] * 100000 + ['EXAMPLE'] * 100000,
654
'charge': np.random.choice([2, 3, 4], 200000),
655
'proteins': [f'P{i:05d}' for i in range(200000)]
656
})
657
658
# Define processing function
659
def add_theoretical_rt(batch_df):
660
"""Add theoretical retention time based on sequence properties."""
661
batch_df = batch_df.copy()
662
# Simple hydrophobicity-based RT prediction (example)
663
hydrophobic_aas = ['A', 'I', 'L', 'F', 'W', 'Y', 'V']
664
batch_df['theoretical_rt'] = [
665
sum(1 for aa in seq if aa in hydrophobic_aas) * 2.5 + 10
666
for seq in batch_df['sequence']
667
]
668
return batch_df
669
670
# Process in batches
671
processed_df = process_precursors_in_batches(
672
large_df,
673
processing_func=add_theoretical_rt,
674
batch_size=50000,
675
n_jobs=4
676
)
677
678
print(f"Processed {len(processed_df)} precursors with theoretical RT")
679
print(f"RT range: {processed_df['theoretical_rt'].min():.1f} - {processed_df['theoretical_rt'].max():.1f}")
680
```
681
682
### Data Quality Assessment
683
684
```python
685
from alphabase.peptide.precursor import (
686
validate_precursor_data_integrity,
687
detect_precursor_outliers,
688
calculate_precursor_statistics
689
)
690
691
# Validate data integrity
692
validation_results = validate_precursor_data_integrity(processed_df)
693
print(f"Validation results:")
694
for check, result in validation_results.items():
695
print(f" {check}: {result}")
696
697
# Calculate comprehensive statistics
698
stats_df = calculate_precursor_statistics(processed_df)
699
print(f"Precursor statistics:")
700
print(stats_df.head())
701
702
# Detect outliers
703
outliers = detect_precursor_outliers(
704
processed_df,
705
method='zscore',
706
threshold=3.0
707
)
708
print(f"Detected {outliers.sum()} outlier precursors ({outliers.mean()*100:.1f}%)")
709
710
# Remove outliers
711
clean_df = processed_df[~outliers].copy()
712
print(f"Clean dataset: {len(clean_df)} precursors")
713
```
714
715
### Modification Pattern Analysis
716
717
```python
718
from alphabase.peptide.precursor import analyze_modification_patterns
719
720
# Add more complex modifications for analysis
721
complex_df = refined_df.copy()
722
complex_df['mods'] = [
723
'Oxidation (M)@3;Acetyl (Protein N-term)@0',
724
'Phospho (STY)@2;Phospho (STY)@5',
725
'Carbamidomethyl (C)@2;Oxidation (M)@6'
726
]
727
728
# Analyze modification patterns
729
mod_analysis = analyze_modification_patterns(complex_df)
730
print(f"Modification analysis:")
731
print(f" Most common modifications: {mod_analysis['common_mods']}")
732
print(f" Co-occurring modifications: {mod_analysis['cooccurrence']}")
733
print(f" Site preferences: {mod_analysis['site_preferences']}")
734
```
735
736
### Memory Optimization
737
738
```python
739
from alphabase.peptide.precursor import optimize_precursor_memory_layout
740
741
# Check memory usage before optimization
742
print(f"Memory usage before optimization: {processed_df.memory_usage(deep=True).sum() / 1e6:.1f} MB")
743
744
# Optimize memory layout
745
optimized_df = optimize_precursor_memory_layout(processed_df)
746
print(f"Memory usage after optimization: {optimized_df.memory_usage(deep=True).sum() / 1e6:.1f} MB")
747
748
# Compare data types
749
print("Data type changes:")
750
for col in processed_df.columns:
751
if col in optimized_df.columns:
752
old_dtype = processed_df[col].dtype
753
new_dtype = optimized_df[col].dtype
754
if old_dtype != new_dtype:
755
print(f" {col}: {old_dtype} -> {new_dtype}")
756
```