or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-padelpy

A Python wrapper for PaDEL-Descriptor software that enables molecular descriptor and fingerprint calculation from SMILES, MDL, and SDF inputs

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/padelpy@0.1.x

To install, run

npx @tessl/cli install tessl/pypi-padelpy@0.1.0

0

# PaDELPy

1

2

A Python wrapper for PaDEL-Descriptor software that enables molecular descriptor and fingerprint calculation from SMILES strings, MDL MolFiles, and SDF files. PaDELPy provides both high-level convenience functions and low-level command-line wrapper access to the bundled PaDEL-Descriptor tool.

3

4

## Package Information

5

6

- **Package Name**: padelpy

7

- **Language**: Python

8

- **Installation**: `pip install padelpy`

9

- **Requirements**: Java JRE 6+ (PaDEL-Descriptor is bundled)

10

11

## Core Imports

12

13

```python

14

from padelpy import from_smiles, from_mdl, from_sdf, padeldescriptor

15

```

16

17

## Basic Usage

18

19

```python

20

from padelpy import from_smiles, from_mdl, from_sdf

21

22

# Calculate descriptors from SMILES string

23

descriptors = from_smiles('CCC') # propane

24

print(f"Number of descriptors: {len(descriptors)}")

25

print(f"Molecular weight: {descriptors['MW']}")

26

27

# Calculate descriptors from multiple SMILES

28

multi_descriptors = from_smiles(['CCC', 'CCCC']) # propane and butane

29

print(f"Processed {len(multi_descriptors)} molecules")

30

31

# Calculate both descriptors and fingerprints

32

desc_fp = from_smiles('CCC', fingerprints=True)

33

34

# Process MDL file

35

mdl_descriptors = from_mdl('molecules.mdl')

36

37

# Process SDF file

38

sdf_descriptors = from_sdf('molecules.sdf')

39

40

# Save results to CSV

41

from_smiles('CCC', output_csv='descriptors.csv')

42

```

43

44

## Capabilities

45

46

### SMILES to Descriptors

47

48

Converts SMILES strings to molecular descriptors and fingerprints with automatic 3D structure generation and comprehensive parameter control.

49

50

```python { .api }

51

def from_smiles(smiles, output_csv: str = None, descriptors: bool = True, fingerprints: bool = False, timeout: int = 60, maxruntime: int = -1, threads: int = -1) -> 'OrderedDict | list':

52

"""

53

Convert SMILES string(s) to molecular descriptors/fingerprints.

54

55

Args:

56

smiles (str or list): SMILES string or list of SMILES strings

57

output_csv (str, optional): CSV file path to save descriptors

58

descriptors (bool): Calculate descriptors if True (default: True)

59

fingerprints (bool): Calculate fingerprints if True (default: False)

60

timeout (int): Maximum conversion time in seconds (default: 60)

61

maxruntime (int): Maximum running time per molecule in seconds (default: -1, unlimited)

62

threads (int): Number of threads to use (default: -1, max available)

63

64

Returns:

65

OrderedDict or list: Single OrderedDict for one molecule (str input),

66

list of OrderedDicts for multiple molecules (list input)

67

68

Raises:

69

RuntimeError: For invalid SMILES or processing failures

70

"""

71

```

72

73

**Usage Examples:**

74

75

```python

76

# Single SMILES

77

descriptors = from_smiles('CCC')

78

79

# Multiple SMILES

80

descriptors = from_smiles(['CCC', 'CCCC'])

81

82

# Only fingerprints

83

fingerprints = from_smiles('CCC', fingerprints=True, descriptors=False)

84

85

# Control performance

86

descriptors = from_smiles(['CCC', 'CCCC'], threads=1, maxruntime=30)

87

88

# Save to file

89

from_smiles('CCC', output_csv='propane_descriptors.csv')

90

```

91

92

### MDL File Processing

93

94

Processes MDL MolFiles containing one or more molecular structures, extracting descriptors and fingerprints for each compound.

95

96

```python { .api }

97

def from_mdl(mdl_file: str, output_csv: str = None, descriptors: bool = True, fingerprints: bool = False, timeout: int = 60, maxruntime: int = -1, threads: int = -1) -> list:

98

"""

99

Convert MDL file to molecular descriptors/fingerprints.

100

101

Args:

102

mdl_file (str): Path to MDL file (must have .mdl extension)

103

output_csv (str, optional): CSV file path to save descriptors

104

descriptors (bool): Calculate descriptors if True (default: True)

105

fingerprints (bool): Calculate fingerprints if True (default: False)

106

timeout (int): Maximum conversion time in seconds (default: 60)

107

maxruntime (int): Maximum running time per molecule in seconds (default: -1, unlimited)

108

threads (int): Number of threads to use (default: -1, max available)

109

110

Returns:

111

list: List of dicts, each corresponding to a compound in the MDL file

112

113

Raises:

114

ValueError: For invalid file extension (.mdl required)

115

RuntimeError: For processing failures

116

"""

117

```

118

119

**Usage Examples:**

120

121

```python

122

# Process MDL file

123

descriptors = from_mdl('molecules.mdl')

124

125

# Include fingerprints

126

desc_fp = from_mdl('molecules.mdl', fingerprints=True)

127

128

# Single-threaded processing

129

descriptors = from_mdl('molecules.mdl', threads=1)

130

131

# Save results

132

from_mdl('molecules.mdl', output_csv='mdl_descriptors.csv')

133

```

134

135

### SDF File Processing

136

137

Processes Structure Data Format (SDF) files containing molecular structures with optional associated data.

138

139

```python { .api }

140

def from_sdf(sdf_file: str, output_csv: str = None, descriptors: bool = True, fingerprints: bool = False, timeout: int = 60, maxruntime: int = -1, threads: int = -1) -> list:

141

"""

142

Convert SDF file to molecular descriptors/fingerprints.

143

144

Args:

145

sdf_file (str): Path to SDF file (must have .sdf extension)

146

output_csv (str, optional): CSV file path to save descriptors

147

descriptors (bool): Calculate descriptors if True (default: True)

148

fingerprints (bool): Calculate fingerprints if True (default: False)

149

timeout (int): Maximum conversion time in seconds (default: 60)

150

maxruntime (int): Maximum running time per molecule in seconds (default: -1, unlimited)

151

threads (int): Number of threads to use (default: -1, max available)

152

153

Returns:

154

list: List of dicts, each corresponding to a compound in the SDF file

155

156

Raises:

157

ValueError: For invalid file extension (.sdf required)

158

RuntimeError: For processing failures

159

"""

160

```

161

162

**Usage Examples:**

163

164

```python

165

# Process SDF file

166

descriptors = from_sdf('molecules.sdf')

167

168

# Only fingerprints

169

fingerprints = from_sdf('molecules.sdf', fingerprints=True, descriptors=False)

170

171

# Control processing time

172

descriptors = from_sdf('molecules.sdf', maxruntime=120, timeout=300)

173

```

174

175

### Command-Line Wrapper

176

177

Direct access to PaDEL-Descriptor's command-line interface with full parameter control for advanced use cases and batch processing.

178

179

```python { .api }

180

def padeldescriptor(maxruntime: int = -1, waitingjobs: int = -1, threads: int = -1, d_2d: bool = False, d_3d: bool = False, config: str = None, convert3d: bool = False, descriptortypes: str = None, detectaromaticity: bool = False, mol_dir: str = None, d_file: str = None, fingerprints: bool = False, log: bool = False, maxcpdperfile: int = 0, removesalt: bool = False, retain3d: bool = False, retainorder: bool = True, standardizenitro: bool = False, standardizetautomers: bool = False, tautomerlist: str = None, usefilenameasmolname: bool = False, sp_timeout: int = None, headless: bool = True) -> None:

181

"""

182

Complete wrapper for PaDEL-Descriptor command-line interface.

183

184

Args:

185

maxruntime (int): Maximum running time per molecule in milliseconds (default: -1, unlimited)

186

waitingjobs (int): Maximum jobs in queue for worker threads (default: -1, 50 * max threads)

187

threads (int): Maximum number of threads to use (default: -1, equal to CPU cores)

188

d_2d (bool): Calculate 2-D descriptors (default: False)

189

d_3d (bool): Calculate 3-D descriptors (default: False)

190

config (str): Path to configuration file (optional)

191

convert3d (bool): Convert molecule to 3-D (default: False)

192

descriptortypes (str): Path to descriptor types file (optional)

193

detectaromaticity (bool): Auto-detect aromaticity before calculation (default: False)

194

mol_dir (str): Path to directory/file containing structural files

195

d_file (str): Path to save calculated descriptors

196

fingerprints (bool): Calculate fingerprints (default: False)

197

log (bool): Create log file (default: False)

198

maxcpdperfile (int): Maximum compounds per descriptor file (default: 0, unlimited)

199

removesalt (bool): Remove salt from molecules (default: False)

200

retain3d (bool): Retain 3-D coordinates when standardizing (default: False)

201

retainorder (bool): Retain molecule order in files (default: True)

202

standardizenitro (bool): Standardize nitro groups to N(:O):O (default: False)

203

standardizetautomers (bool): Standardize tautomers (default: False)

204

tautomerlist (str): Path to SMIRKS tautomers file (optional)

205

usefilenameasmolname (bool): Use filename as molecule name (default: False)

206

sp_timeout (int): Subprocess timeout in seconds (optional)

207

headless (bool): Prevent PaDEL splash image from loading (default: True)

208

209

Returns:

210

None

211

212

Raises:

213

ReferenceError: If Java JRE 6+ not found

214

RuntimeError: For PaDEL-Descriptor processing errors

215

"""

216

```

217

218

**Usage Examples:**

219

220

```python

221

from padelpy import padeldescriptor

222

223

# Basic usage with MDL input

224

padeldescriptor(mol_dir='molecules.mdl', d_file='descriptors.csv')

225

226

# SDF input with 2D and 3D descriptors

227

padeldescriptor(

228

mol_dir='molecules.sdf',

229

d_file='descriptors.csv',

230

d_2d=True,

231

d_3d=True

232

)

233

234

# Directory of structure files

235

padeldescriptor(mol_dir='/path/to/molecules/', d_file='descriptors.csv')

236

237

# SMILES file input

238

padeldescriptor(mol_dir='molecules.smi', d_file='descriptors.csv')

239

240

# Advanced configuration

241

padeldescriptor(

242

mol_dir='molecules.sdf',

243

d_file='descriptors.csv',

244

fingerprints=True,

245

convert3d=True,

246

removesalt=True,

247

standardizetautomers=True,

248

threads=4,

249

maxruntime=30000, # 30 seconds per molecule

250

log=True

251

)

252

253

# Configuration file

254

padeldescriptor(config='/path/to/config.xml')

255

```

256

257

## Types

258

259

```python { .api }

260

# Import required for return types

261

from collections import OrderedDict

262

```

263

264

## Descriptor Information

265

266

- **Total Descriptors**: 1875 descriptors and fingerprints per molecule

267

- **2D Descriptors**: Molecular properties calculated from 2D structure

268

- **3D Descriptors**: Molecular properties requiring 3D coordinates

269

- **PubChem Fingerprints**: Binary fingerprints for molecular similarity

270

- **Output Format**: CSV files with descriptor names as columns, molecules as rows

271

272

## Error Handling

273

274

All functions may raise exceptions for various error conditions:

275

276

- **RuntimeError**: Invalid molecular structures, PaDEL-Descriptor processing failures, timeout exceeded

277

- **ValueError**: Invalid file extensions for MDL/SDF files

278

- **ReferenceError**: Java JRE not found (required for PaDEL-Descriptor)

279

- **KeyboardInterrupt**: User interruption (handled with cleanup)

280

- **FileNotFoundError**: Missing input files (handled internally with warnings)

281

282

## Performance Considerations

283

284

- **Multi-threading**: Use `threads` parameter to control parallel processing

285

- **Timeouts**: Set `timeout` for overall processing and `maxruntime` per molecule

286

- **Memory**: Large molecular datasets may require batch processing

287

- **3D Conversion**: Automatic 3D structure generation in convenience functions

288

- **Retry Logic**: Automatic retry (up to 3 attempts) for failed operations