CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-padelpy

A Python wrapper for PaDEL-Descriptor software that enables molecular descriptor and fingerprint calculation from SMILES, MDL, and SDF inputs

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

PaDELPy

A Python wrapper for PaDEL-Descriptor software that enables molecular descriptor and fingerprint calculation from SMILES strings, MDL MolFiles, and SDF files. PaDELPy provides both high-level convenience functions and low-level command-line wrapper access to the bundled PaDEL-Descriptor tool.

Package Information

  • Package Name: padelpy
  • Language: Python
  • Installation: pip install padelpy
  • Requirements: Java JRE 6+ (PaDEL-Descriptor is bundled)

Core Imports

from padelpy import from_smiles, from_mdl, from_sdf, padeldescriptor

Basic Usage

from padelpy import from_smiles, from_mdl, from_sdf

# Calculate descriptors from SMILES string
descriptors = from_smiles('CCC')  # propane
print(f"Number of descriptors: {len(descriptors)}")
print(f"Molecular weight: {descriptors['MW']}")

# Calculate descriptors from multiple SMILES
multi_descriptors = from_smiles(['CCC', 'CCCC'])  # propane and butane
print(f"Processed {len(multi_descriptors)} molecules")

# Calculate both descriptors and fingerprints
desc_fp = from_smiles('CCC', fingerprints=True)

# Process MDL file
mdl_descriptors = from_mdl('molecules.mdl')

# Process SDF file
sdf_descriptors = from_sdf('molecules.sdf')

# Save results to CSV
from_smiles('CCC', output_csv='descriptors.csv')

Capabilities

SMILES to Descriptors

Converts SMILES strings to molecular descriptors and fingerprints with automatic 3D structure generation and comprehensive parameter control.

def from_smiles(smiles, output_csv: str = None, descriptors: bool = True, fingerprints: bool = False, timeout: int = 60, maxruntime: int = -1, threads: int = -1) -> 'OrderedDict | list':
    """
    Convert SMILES string(s) to molecular descriptors/fingerprints.
    
    Args:
        smiles (str or list): SMILES string or list of SMILES strings
        output_csv (str, optional): CSV file path to save descriptors
        descriptors (bool): Calculate descriptors if True (default: True)
        fingerprints (bool): Calculate fingerprints if True (default: False)
        timeout (int): Maximum conversion time in seconds (default: 60)
        maxruntime (int): Maximum running time per molecule in seconds (default: -1, unlimited)
        threads (int): Number of threads to use (default: -1, max available)
    
    Returns:
        OrderedDict or list: Single OrderedDict for one molecule (str input),
                           list of OrderedDicts for multiple molecules (list input)
    
    Raises:
        RuntimeError: For invalid SMILES or processing failures
    """

Usage Examples:

# Single SMILES
descriptors = from_smiles('CCC')

# Multiple SMILES  
descriptors = from_smiles(['CCC', 'CCCC'])

# Only fingerprints
fingerprints = from_smiles('CCC', fingerprints=True, descriptors=False)

# Control performance
descriptors = from_smiles(['CCC', 'CCCC'], threads=1, maxruntime=30)

# Save to file
from_smiles('CCC', output_csv='propane_descriptors.csv')

MDL File Processing

Processes MDL MolFiles containing one or more molecular structures, extracting descriptors and fingerprints for each compound.

def from_mdl(mdl_file: str, output_csv: str = None, descriptors: bool = True, fingerprints: bool = False, timeout: int = 60, maxruntime: int = -1, threads: int = -1) -> list:
    """
    Convert MDL file to molecular descriptors/fingerprints.
    
    Args:
        mdl_file (str): Path to MDL file (must have .mdl extension)
        output_csv (str, optional): CSV file path to save descriptors
        descriptors (bool): Calculate descriptors if True (default: True)
        fingerprints (bool): Calculate fingerprints if True (default: False)
        timeout (int): Maximum conversion time in seconds (default: 60)
        maxruntime (int): Maximum running time per molecule in seconds (default: -1, unlimited)
        threads (int): Number of threads to use (default: -1, max available)
    
    Returns:
        list: List of dicts, each corresponding to a compound in the MDL file
    
    Raises:
        ValueError: For invalid file extension (.mdl required)
        RuntimeError: For processing failures
    """

Usage Examples:

# Process MDL file
descriptors = from_mdl('molecules.mdl')

# Include fingerprints
desc_fp = from_mdl('molecules.mdl', fingerprints=True)

# Single-threaded processing
descriptors = from_mdl('molecules.mdl', threads=1)

# Save results
from_mdl('molecules.mdl', output_csv='mdl_descriptors.csv')

SDF File Processing

Processes Structure Data Format (SDF) files containing molecular structures with optional associated data.

def from_sdf(sdf_file: str, output_csv: str = None, descriptors: bool = True, fingerprints: bool = False, timeout: int = 60, maxruntime: int = -1, threads: int = -1) -> list:
    """
    Convert SDF file to molecular descriptors/fingerprints.
    
    Args:
        sdf_file (str): Path to SDF file (must have .sdf extension)
        output_csv (str, optional): CSV file path to save descriptors
        descriptors (bool): Calculate descriptors if True (default: True)
        fingerprints (bool): Calculate fingerprints if True (default: False)
        timeout (int): Maximum conversion time in seconds (default: 60)
        maxruntime (int): Maximum running time per molecule in seconds (default: -1, unlimited)
        threads (int): Number of threads to use (default: -1, max available)
    
    Returns:
        list: List of dicts, each corresponding to a compound in the SDF file
    
    Raises:
        ValueError: For invalid file extension (.sdf required)
        RuntimeError: For processing failures
    """

Usage Examples:

# Process SDF file
descriptors = from_sdf('molecules.sdf')

# Only fingerprints
fingerprints = from_sdf('molecules.sdf', fingerprints=True, descriptors=False)

# Control processing time
descriptors = from_sdf('molecules.sdf', maxruntime=120, timeout=300)

Command-Line Wrapper

Direct access to PaDEL-Descriptor's command-line interface with full parameter control for advanced use cases and batch processing.

def padeldescriptor(maxruntime: int = -1, waitingjobs: int = -1, threads: int = -1, d_2d: bool = False, d_3d: bool = False, config: str = None, convert3d: bool = False, descriptortypes: str = None, detectaromaticity: bool = False, mol_dir: str = None, d_file: str = None, fingerprints: bool = False, log: bool = False, maxcpdperfile: int = 0, removesalt: bool = False, retain3d: bool = False, retainorder: bool = True, standardizenitro: bool = False, standardizetautomers: bool = False, tautomerlist: str = None, usefilenameasmolname: bool = False, sp_timeout: int = None, headless: bool = True) -> None:
    """
    Complete wrapper for PaDEL-Descriptor command-line interface.
    
    Args:
        maxruntime (int): Maximum running time per molecule in milliseconds (default: -1, unlimited)
        waitingjobs (int): Maximum jobs in queue for worker threads (default: -1, 50 * max threads)
        threads (int): Maximum number of threads to use (default: -1, equal to CPU cores)
        d_2d (bool): Calculate 2-D descriptors (default: False)
        d_3d (bool): Calculate 3-D descriptors (default: False)
        config (str): Path to configuration file (optional)
        convert3d (bool): Convert molecule to 3-D (default: False)
        descriptortypes (str): Path to descriptor types file (optional)
        detectaromaticity (bool): Auto-detect aromaticity before calculation (default: False)
        mol_dir (str): Path to directory/file containing structural files
        d_file (str): Path to save calculated descriptors
        fingerprints (bool): Calculate fingerprints (default: False)
        log (bool): Create log file (default: False)
        maxcpdperfile (int): Maximum compounds per descriptor file (default: 0, unlimited)
        removesalt (bool): Remove salt from molecules (default: False)
        retain3d (bool): Retain 3-D coordinates when standardizing (default: False)
        retainorder (bool): Retain molecule order in files (default: True)
        standardizenitro (bool): Standardize nitro groups to N(:O):O (default: False)
        standardizetautomers (bool): Standardize tautomers (default: False)
        tautomerlist (str): Path to SMIRKS tautomers file (optional)
        usefilenameasmolname (bool): Use filename as molecule name (default: False)
        sp_timeout (int): Subprocess timeout in seconds (optional)
        headless (bool): Prevent PaDEL splash image from loading (default: True)
    
    Returns:
        None
    
    Raises:
        ReferenceError: If Java JRE 6+ not found
        RuntimeError: For PaDEL-Descriptor processing errors
    """

Usage Examples:

from padelpy import padeldescriptor

# Basic usage with MDL input
padeldescriptor(mol_dir='molecules.mdl', d_file='descriptors.csv')

# SDF input with 2D and 3D descriptors
padeldescriptor(
    mol_dir='molecules.sdf',
    d_file='descriptors.csv',
    d_2d=True,
    d_3d=True
)

# Directory of structure files
padeldescriptor(mol_dir='/path/to/molecules/', d_file='descriptors.csv')

# SMILES file input
padeldescriptor(mol_dir='molecules.smi', d_file='descriptors.csv')

# Advanced configuration
padeldescriptor(
    mol_dir='molecules.sdf',
    d_file='descriptors.csv',
    fingerprints=True,
    convert3d=True,
    removesalt=True,
    standardizetautomers=True,
    threads=4,
    maxruntime=30000,  # 30 seconds per molecule
    log=True
)

# Configuration file
padeldescriptor(config='/path/to/config.xml')

Types

# Import required for return types
from collections import OrderedDict

Descriptor Information

  • Total Descriptors: 1875 descriptors and fingerprints per molecule
  • 2D Descriptors: Molecular properties calculated from 2D structure
  • 3D Descriptors: Molecular properties requiring 3D coordinates
  • PubChem Fingerprints: Binary fingerprints for molecular similarity
  • Output Format: CSV files with descriptor names as columns, molecules as rows

Error Handling

All functions may raise exceptions for various error conditions:

  • RuntimeError: Invalid molecular structures, PaDEL-Descriptor processing failures, timeout exceeded
  • ValueError: Invalid file extensions for MDL/SDF files
  • ReferenceError: Java JRE not found (required for PaDEL-Descriptor)
  • KeyboardInterrupt: User interruption (handled with cleanup)
  • FileNotFoundError: Missing input files (handled internally with warnings)

Performance Considerations

  • Multi-threading: Use threads parameter to control parallel processing
  • Timeouts: Set timeout for overall processing and maxruntime per molecule
  • Memory: Large molecular datasets may require batch processing
  • 3D Conversion: Automatic 3D structure generation in convenience functions
  • Retry Logic: Automatic retry (up to 3 attempts) for failed operations
Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/padelpy@0.1.x
Publish Source
CLI
Badge
tessl/pypi-padelpy badge