CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-alphabase

An infrastructure Python package of the AlphaX ecosystem for MS proteomics

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview
Eval results
Files

chemical-constants.mddocs/

Chemical Constants and Calculations

Comprehensive databases and calculation functions for amino acids, chemical elements, modifications, and isotopes. These components form the foundation of all mass spectrometry calculations in AlphaBase, providing pre-computed lookup tables and vectorized operations for high-performance proteomics workflows.

Capabilities

Amino Acid Constants and Calculations

Core amino acid database with masses, formulas, and properties, plus vectorized calculation functions for peptide sequences.

# Global constants
AA_ASCII_MASS: np.ndarray  # 128-length array indexed by ASCII code
AA_DF: pd.DataFrame       # Complete amino acid properties dataframe
AA_Composition: dict      # Amino acid formula compositions
aa_formula: pd.DataFrame  # Amino acid formulas and properties

# Mass calculation functions
def calc_AA_masses(sequences: List[str]) -> np.ndarray:
    """
    Calculate amino acid masses for peptide sequences.
    
    Parameters:
    - sequences: List of peptide sequences
    
    Returns:
    2D numpy array with masses for each AA position
    """

def calc_AA_masses_for_same_len_seqs(sequences: List[str]) -> np.ndarray:
    """
    Fast batch calculation for equal-length sequences.
    
    Parameters:
    - sequences: List of equal-length peptide sequences
    
    Returns:
    2D numpy array with optimized memory layout
    """

def calc_sequence_masses_for_same_len_seqs(sequences: List[str]) -> np.ndarray:
    """
    Calculate full sequence masses for equal-length sequences.
    
    Parameters:
    - sequences: List of equal-length peptide sequences
    
    Returns:
    1D numpy array with total masses
    """

# Database modification functions
def update_an_AA(aa_code: str, formula: dict, mass: float = None) -> None:
    """
    Update a single amino acid definition.
    
    Parameters:
    - aa_code: Single letter amino acid code
    - formula: Chemical formula as dict {'C': 6, 'H': 12, ...}
    - mass: Optional mass override
    """

def reset_AA_mass() -> None:
    """Recalculate amino acid masses after modifications."""

def reset_AA_df() -> None:
    """Reset amino acid DataFrame from formulas."""

Chemical Elements and Atoms

Fundamental chemical constants and formula parsing capabilities with isotope information.

# Physical constants
MASS_PROTON: float = 1.00727646688
MASS_ISOTOPE: float = 1.00235
MAX_ISOTOPE_LEN: int = 8

# Element masses
MASS_H: float = 1.007825032
MASS_C: float = 12.0
MASS_O: float = 15.994914620
MASS_N: float = 14.003074004
MASS_H2O: float = 18.0105647
MASS_NH3: float = 17.026549101

# Chemical databases
CHEM_INFO_DICT: dict       # Element information dictionary
CHEM_MONO_MASS: dict       # Monoisotopic masses dictionary  
CHEM_ISOTOPE_DIST: dict    # Isotope distributions dictionary
CHEM_MONO_IDX: dict        # Monoisotopic index mappings
EMPTY_DIST: np.ndarray     # Default isotope distribution

# Formula parsing and mass calculation
def parse_formula(formula: str) -> dict:
    """
    Parse chemical formula string into composition dictionary.
    
    Parameters:
    - formula: Chemical formula like 'C6H12N2O'
    
    Returns:
    Dictionary with element counts {'C': 6, 'H': 12, 'N': 2, 'O': 1}
    """

def calc_mass_from_formula(formula: str) -> float:
    """
    Calculate monoisotopic mass from chemical formula.
    
    Parameters:
    - formula: Chemical formula string
    
    Returns:
    Monoisotopic mass as float
    """

class ChemicalCompositonFormula:
    """Handle chemical compositions and parse SMILES notation."""
    
    def __init__(self, formula: str = None):
        """
        Initialize with optional formula.
        
        Parameters:
        - formula: Chemical formula string or SMILES notation
        """
    
    def calc_mass(self) -> float:
        """Calculate monoisotopic mass of composition."""

# Database management
def update_atom_infos(atom_dict: dict) -> None:
    """Update atomic information from external data."""

def reset_elements() -> None:
    """Reset element data from default sources."""

def load_elem_yaml(yaml_path: str) -> None:
    """Load element definitions from YAML file."""

Modifications Database and Calculations

Complete modification database with masses, formulas, and loss patterns, plus calculation functions for modified peptide sequences.

# Global modification constants
MOD_DF: pd.DataFrame           # Main modification database
MOD_INFO_DICT: dict           # Modification information
MOD_CHEM: dict                # Modification chemistry
MOD_MASS: dict                # Modification masses
MOD_LOSS_MASS: dict           # Modification loss masses
MOD_Composition: dict         # Modification compositions
MOD_LOSS_IMPORTANCE: dict     # Loss importance rankings

# Modification mass calculations
def calc_modification_mass(mod_sequences: List[str]) -> np.ndarray:
    """
    Calculate modification masses for peptide sequences.
    
    Parameters:
    - mod_sequences: List of modified sequences like 'PEPTIDE[Oxidation (M)]'
    
    Returns:
    2D numpy array with modification masses per position
    """

def calc_mod_masses_for_same_len_seqs(mod_sequences: List[str]) -> np.ndarray:
    """
    Batch modification mass calculation for equal-length sequences.
    
    Parameters:
    - mod_sequences: List of equal-length modified sequences
    
    Returns:
    2D numpy array with optimized layout
    """

def calc_modification_mass_sum(mod_sequences: List[str]) -> np.ndarray:
    """
    Sum modification masses across peptide sequences.
    
    Parameters:
    - mod_sequences: List of modified sequences
    
    Returns:
    1D numpy array with total modification masses
    """

def calc_modloss_mass(mod_sequences: List[str]) -> np.ndarray:
    """
    Calculate modification loss masses.
    
    Parameters:
    - mod_sequences: List of modified sequences
    
    Returns:
    2D numpy array with loss masses
    """

def calc_modloss_mass_with_importance(mod_sequences: List[str], 
                                     importance_level: int = 1) -> np.ndarray:
    """
    Calculate modification losses filtered by importance.
    
    Parameters:
    - mod_sequences: List of modified sequences
    - importance_level: Minimum importance level (1-3)
    
    Returns:
    2D numpy array with filtered loss masses
    """

# Database management
def add_new_modifications(mod_df: pd.DataFrame) -> None:
    """
    Add custom modifications to global database.
    
    Parameters:
    - mod_df: DataFrame with new modifications
    """

def has_custom_mods() -> bool:
    """Check for presence of user-defined modifications."""

def load_mod_df(tsv_path: str) -> pd.DataFrame:
    """Load modifications from TSV file."""

def update_all_by_MOD_DF() -> None:
    """Update all modification globals from main DataFrame."""

def keep_modloss_by_importance(importance_level: int = 1) -> None:
    """Filter modification losses by importance ranking."""

Isotope Calculations

Fast isotope pattern calculation with pre-built lookup tables and mathematical convolution functions.

class IsotopeDistribution:
    """Fast isotope distribution calculator with pre-built tables."""
    
    def __init__(self, max_mass: int = 2000, max_isotope_len: int = 8):
        """
        Initialize isotope calculator.
        
        Parameters:
        - max_mass: Maximum mass for pre-calculated tables
        - max_isotope_len: Maximum isotope pattern length
        """
    
    def calc_isotope_distribution(self, formula: str) -> np.ndarray:
        """
        Calculate isotope distribution for chemical formula.
        
        Parameters:
        - formula: Chemical formula string
        
        Returns:
        Numpy array with isotope intensities
        """

# Direct calculation functions
def formula_dist(formula: str) -> np.ndarray:
    """
    Generate isotope distribution for chemical formula.
    
    Parameters:
    - formula: Chemical formula string
    
    Returns:
    Numpy array with isotope pattern
    """

def one_element_dist(element: str, count: int) -> np.ndarray:
    """
    Calculate single element isotope distribution.
    
    Parameters:
    - element: Element symbol ('C', 'H', etc.)
    - count: Number of atoms
    
    Returns:
    Numpy array with isotope intensities
    """

def abundance_convolution(dist1: np.ndarray, dist2: np.ndarray) -> np.ndarray:
    """
    Convolute two isotope distributions.
    
    Parameters:
    - dist1: First isotope distribution
    - dist2: Second isotope distribution
    
    Returns:
    Convolved isotope distribution
    """

def truncate_isotope(distribution: np.ndarray, max_len: int = 8) -> np.ndarray:
    """
    Truncate isotope distribution to specified length.
    
    Parameters:
    - distribution: Input isotope distribution
    - max_len: Maximum length to keep
    
    Returns:
    Truncated distribution
    """

Usage Examples

Basic Mass Calculations

from alphabase.constants.aa import calc_AA_masses
from alphabase.constants.modification import calc_modification_mass

# Calculate amino acid masses
sequences = ['PEPTIDE', 'SEQUENCE', 'EXAMPLE']
aa_masses = calc_AA_masses(sequences)
print(f"AA masses shape: {aa_masses.shape}")  # (3, 8) for longest sequence

# Calculate modification masses
mod_sequences = ['PEPTIDE[Oxidation (M)]', 'SEQUENCE[Phospho (STY)]']
mod_masses = calc_modification_mass(mod_sequences)
print(f"Modification masses: {mod_masses}")

Chemical Formula Processing

from alphabase.constants.atom import parse_formula, calc_mass_from_formula

# Parse and calculate mass
formula = "C6H12N2O2"
composition = parse_formula(formula)
mass = calc_mass_from_formula(formula)
print(f"Formula {formula}: {composition}, Mass: {mass:.6f}")

Custom Modifications

import pandas as pd
from alphabase.constants.modification import add_new_modifications

# Add custom modification
custom_mods = pd.DataFrame({
    'mod_name': ['Custom_Mod'],
    'mass': [42.0106],
    'composition': ['C2H2O'],
    'aa': ['K'],
    'position': ['any']
})

add_new_modifications(custom_mods)

Isotope Pattern Calculation

from alphabase.constants.isotope import IsotopeDistribution

# Calculate isotope pattern
iso_calc = IsotopeDistribution()
pattern = iso_calc.calc_isotope_distribution("C50H80N14O10")
print(f"Isotope pattern: {pattern}")

Install with Tessl CLI

npx tessl i tessl/pypi-alphabase

docs

advanced-peptide-operations.md

advanced-spectral-libraries.md

chemical-constants.md

fragment-ions.md

index.md

io-utilities.md

protein-analysis.md

psm-readers.md

quantification.md

smiles-chemistry.md

spectral-libraries.md

tile.json