0
# PaDELPy
1
2
A Python wrapper for PaDEL-Descriptor software that enables molecular descriptor and fingerprint calculation from SMILES strings, MDL MolFiles, and SDF files. PaDELPy provides both high-level convenience functions and low-level command-line wrapper access to the bundled PaDEL-Descriptor tool.
3
4
## Package Information
5
6
- **Package Name**: padelpy
7
- **Language**: Python
8
- **Installation**: `pip install padelpy`
9
- **Requirements**: Java JRE 6+ (PaDEL-Descriptor is bundled)
10
11
## Core Imports
12
13
```python
14
from padelpy import from_smiles, from_mdl, from_sdf, padeldescriptor
15
```
16
17
## Basic Usage
18
19
```python
20
from padelpy import from_smiles, from_mdl, from_sdf
21
22
# Calculate descriptors from SMILES string
23
descriptors = from_smiles('CCC') # propane
24
print(f"Number of descriptors: {len(descriptors)}")
25
print(f"Molecular weight: {descriptors['MW']}")
26
27
# Calculate descriptors from multiple SMILES
28
multi_descriptors = from_smiles(['CCC', 'CCCC']) # propane and butane
29
print(f"Processed {len(multi_descriptors)} molecules")
30
31
# Calculate both descriptors and fingerprints
32
desc_fp = from_smiles('CCC', fingerprints=True)
33
34
# Process MDL file
35
mdl_descriptors = from_mdl('molecules.mdl')
36
37
# Process SDF file
38
sdf_descriptors = from_sdf('molecules.sdf')
39
40
# Save results to CSV
41
from_smiles('CCC', output_csv='descriptors.csv')
42
```
43
44
## Capabilities
45
46
### SMILES to Descriptors
47
48
Converts SMILES strings to molecular descriptors and fingerprints with automatic 3D structure generation and comprehensive parameter control.
49
50
```python { .api }
51
def from_smiles(smiles, output_csv: str = None, descriptors: bool = True, fingerprints: bool = False, timeout: int = 60, maxruntime: int = -1, threads: int = -1) -> 'OrderedDict | list':
52
"""
53
Convert SMILES string(s) to molecular descriptors/fingerprints.
54
55
Args:
56
smiles (str or list): SMILES string or list of SMILES strings
57
output_csv (str, optional): CSV file path to save descriptors
58
descriptors (bool): Calculate descriptors if True (default: True)
59
fingerprints (bool): Calculate fingerprints if True (default: False)
60
timeout (int): Maximum conversion time in seconds (default: 60)
61
maxruntime (int): Maximum running time per molecule in seconds (default: -1, unlimited)
62
threads (int): Number of threads to use (default: -1, max available)
63
64
Returns:
65
OrderedDict or list: Single OrderedDict for one molecule (str input),
66
list of OrderedDicts for multiple molecules (list input)
67
68
Raises:
69
RuntimeError: For invalid SMILES or processing failures
70
"""
71
```
72
73
**Usage Examples:**
74
75
```python
76
# Single SMILES
77
descriptors = from_smiles('CCC')
78
79
# Multiple SMILES
80
descriptors = from_smiles(['CCC', 'CCCC'])
81
82
# Only fingerprints
83
fingerprints = from_smiles('CCC', fingerprints=True, descriptors=False)
84
85
# Control performance
86
descriptors = from_smiles(['CCC', 'CCCC'], threads=1, maxruntime=30)
87
88
# Save to file
89
from_smiles('CCC', output_csv='propane_descriptors.csv')
90
```
91
92
### MDL File Processing
93
94
Processes MDL MolFiles containing one or more molecular structures, extracting descriptors and fingerprints for each compound.
95
96
```python { .api }
97
def from_mdl(mdl_file: str, output_csv: str = None, descriptors: bool = True, fingerprints: bool = False, timeout: int = 60, maxruntime: int = -1, threads: int = -1) -> list:
98
"""
99
Convert MDL file to molecular descriptors/fingerprints.
100
101
Args:
102
mdl_file (str): Path to MDL file (must have .mdl extension)
103
output_csv (str, optional): CSV file path to save descriptors
104
descriptors (bool): Calculate descriptors if True (default: True)
105
fingerprints (bool): Calculate fingerprints if True (default: False)
106
timeout (int): Maximum conversion time in seconds (default: 60)
107
maxruntime (int): Maximum running time per molecule in seconds (default: -1, unlimited)
108
threads (int): Number of threads to use (default: -1, max available)
109
110
Returns:
111
list: List of dicts, each corresponding to a compound in the MDL file
112
113
Raises:
114
ValueError: For invalid file extension (.mdl required)
115
RuntimeError: For processing failures
116
"""
117
```
118
119
**Usage Examples:**
120
121
```python
122
# Process MDL file
123
descriptors = from_mdl('molecules.mdl')
124
125
# Include fingerprints
126
desc_fp = from_mdl('molecules.mdl', fingerprints=True)
127
128
# Single-threaded processing
129
descriptors = from_mdl('molecules.mdl', threads=1)
130
131
# Save results
132
from_mdl('molecules.mdl', output_csv='mdl_descriptors.csv')
133
```
134
135
### SDF File Processing
136
137
Processes Structure Data Format (SDF) files containing molecular structures with optional associated data.
138
139
```python { .api }
140
def from_sdf(sdf_file: str, output_csv: str = None, descriptors: bool = True, fingerprints: bool = False, timeout: int = 60, maxruntime: int = -1, threads: int = -1) -> list:
141
"""
142
Convert SDF file to molecular descriptors/fingerprints.
143
144
Args:
145
sdf_file (str): Path to SDF file (must have .sdf extension)
146
output_csv (str, optional): CSV file path to save descriptors
147
descriptors (bool): Calculate descriptors if True (default: True)
148
fingerprints (bool): Calculate fingerprints if True (default: False)
149
timeout (int): Maximum conversion time in seconds (default: 60)
150
maxruntime (int): Maximum running time per molecule in seconds (default: -1, unlimited)
151
threads (int): Number of threads to use (default: -1, max available)
152
153
Returns:
154
list: List of dicts, each corresponding to a compound in the SDF file
155
156
Raises:
157
ValueError: For invalid file extension (.sdf required)
158
RuntimeError: For processing failures
159
"""
160
```
161
162
**Usage Examples:**
163
164
```python
165
# Process SDF file
166
descriptors = from_sdf('molecules.sdf')
167
168
# Only fingerprints
169
fingerprints = from_sdf('molecules.sdf', fingerprints=True, descriptors=False)
170
171
# Control processing time
172
descriptors = from_sdf('molecules.sdf', maxruntime=120, timeout=300)
173
```
174
175
### Command-Line Wrapper
176
177
Direct access to PaDEL-Descriptor's command-line interface with full parameter control for advanced use cases and batch processing.
178
179
```python { .api }
180
def padeldescriptor(maxruntime: int = -1, waitingjobs: int = -1, threads: int = -1, d_2d: bool = False, d_3d: bool = False, config: str = None, convert3d: bool = False, descriptortypes: str = None, detectaromaticity: bool = False, mol_dir: str = None, d_file: str = None, fingerprints: bool = False, log: bool = False, maxcpdperfile: int = 0, removesalt: bool = False, retain3d: bool = False, retainorder: bool = True, standardizenitro: bool = False, standardizetautomers: bool = False, tautomerlist: str = None, usefilenameasmolname: bool = False, sp_timeout: int = None, headless: bool = True) -> None:
181
"""
182
Complete wrapper for PaDEL-Descriptor command-line interface.
183
184
Args:
185
maxruntime (int): Maximum running time per molecule in milliseconds (default: -1, unlimited)
186
waitingjobs (int): Maximum jobs in queue for worker threads (default: -1, 50 * max threads)
187
threads (int): Maximum number of threads to use (default: -1, equal to CPU cores)
188
d_2d (bool): Calculate 2-D descriptors (default: False)
189
d_3d (bool): Calculate 3-D descriptors (default: False)
190
config (str): Path to configuration file (optional)
191
convert3d (bool): Convert molecule to 3-D (default: False)
192
descriptortypes (str): Path to descriptor types file (optional)
193
detectaromaticity (bool): Auto-detect aromaticity before calculation (default: False)
194
mol_dir (str): Path to directory/file containing structural files
195
d_file (str): Path to save calculated descriptors
196
fingerprints (bool): Calculate fingerprints (default: False)
197
log (bool): Create log file (default: False)
198
maxcpdperfile (int): Maximum compounds per descriptor file (default: 0, unlimited)
199
removesalt (bool): Remove salt from molecules (default: False)
200
retain3d (bool): Retain 3-D coordinates when standardizing (default: False)
201
retainorder (bool): Retain molecule order in files (default: True)
202
standardizenitro (bool): Standardize nitro groups to N(:O):O (default: False)
203
standardizetautomers (bool): Standardize tautomers (default: False)
204
tautomerlist (str): Path to SMIRKS tautomers file (optional)
205
usefilenameasmolname (bool): Use filename as molecule name (default: False)
206
sp_timeout (int): Subprocess timeout in seconds (optional)
207
headless (bool): Prevent PaDEL splash image from loading (default: True)
208
209
Returns:
210
None
211
212
Raises:
213
ReferenceError: If Java JRE 6+ not found
214
RuntimeError: For PaDEL-Descriptor processing errors
215
"""
216
```
217
218
**Usage Examples:**
219
220
```python
221
from padelpy import padeldescriptor
222
223
# Basic usage with MDL input
224
padeldescriptor(mol_dir='molecules.mdl', d_file='descriptors.csv')
225
226
# SDF input with 2D and 3D descriptors
227
padeldescriptor(
228
mol_dir='molecules.sdf',
229
d_file='descriptors.csv',
230
d_2d=True,
231
d_3d=True
232
)
233
234
# Directory of structure files
235
padeldescriptor(mol_dir='/path/to/molecules/', d_file='descriptors.csv')
236
237
# SMILES file input
238
padeldescriptor(mol_dir='molecules.smi', d_file='descriptors.csv')
239
240
# Advanced configuration
241
padeldescriptor(
242
mol_dir='molecules.sdf',
243
d_file='descriptors.csv',
244
fingerprints=True,
245
convert3d=True,
246
removesalt=True,
247
standardizetautomers=True,
248
threads=4,
249
maxruntime=30000, # 30 seconds per molecule
250
log=True
251
)
252
253
# Configuration file
254
padeldescriptor(config='/path/to/config.xml')
255
```
256
257
## Types
258
259
```python { .api }
260
# Import required for return types
261
from collections import OrderedDict
262
```
263
264
## Descriptor Information
265
266
- **Total Descriptors**: 1875 descriptors and fingerprints per molecule
267
- **2D Descriptors**: Molecular properties calculated from 2D structure
268
- **3D Descriptors**: Molecular properties requiring 3D coordinates
269
- **PubChem Fingerprints**: Binary fingerprints for molecular similarity
270
- **Output Format**: CSV files with descriptor names as columns, molecules as rows
271
272
## Error Handling
273
274
All functions may raise exceptions for various error conditions:
275
276
- **RuntimeError**: Invalid molecular structures, PaDEL-Descriptor processing failures, timeout exceeded
277
- **ValueError**: Invalid file extensions for MDL/SDF files
278
- **ReferenceError**: Java JRE not found (required for PaDEL-Descriptor)
279
- **KeyboardInterrupt**: User interruption (handled with cleanup)
280
- **FileNotFoundError**: Missing input files (handled internally with warnings)
281
282
## Performance Considerations
283
284
- **Multi-threading**: Use `threads` parameter to control parallel processing
285
- **Timeouts**: Set `timeout` for overall processing and `maxruntime` per molecule
286
- **Memory**: Large molecular datasets may require batch processing
287
- **3D Conversion**: Automatic 3D structure generation in convenience functions
288
- **Retry Logic**: Automatic retry (up to 3 attempts) for failed operations