or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

analysis-tools.mddata-io.mddatasets.mdexternal-tools.mdindex.mdpreprocessing.mdqueries.mdspatial-analysis.mdutilities.mdvisualization.md

external-tools.mddocs/

0

# External Tool Integration

1

2

Scanpy's external module provides integration with popular external single-cell analysis tools and methods through a unified interface. This extends scanpy's capabilities with specialized algorithms for dimensionality reduction, trajectory inference, batch correction, imputation, and more.

3

4

## Capabilities

5

6

### External Analysis Tools

7

8

Advanced analysis methods from specialized single-cell packages.

9

10

```python { .api }

11

def phate(adata, n_components=2, knn=5, decay=40, n_landmark=2000, t='auto', gamma=1, n_pca=100, solver='exact', seed=None, n_jobs=1, random_state=None, copy=False, **kwargs):

12

"""

13

PHATE (Potential of Heat-diffusion for Affinity-based Embedding) dimensionality reduction.

14

15

Parameters:

16

- adata (AnnData): Annotated data object

17

- n_components (int): Number of dimensions for embedding

18

- knn (int): Number of nearest neighbors

19

- decay (int): Alpha decay parameter

20

- n_landmark (int): Number of landmark points

21

- t (str or int): Time parameter for diffusion

22

- gamma (float): Informational distance parameter

23

- n_pca (int): Number of PCA components for preprocessing

24

- solver (str): Solver for eigenvalue decomposition

25

- seed (int, optional): Random seed

26

- n_jobs (int): Number of parallel jobs

27

- random_state (int, optional): Random state

28

- copy (bool): Return copy

29

- **kwargs: Additional PHATE parameters

30

31

Returns:

32

AnnData or None: Object with PHATE embedding (if copy=True)

33

"""

34

35

def palantir(adata, start_cell=None, num_waypoints=1200, terminal_states=None, copy=False, **kwargs):

36

"""

37

Palantir trajectory inference algorithm.

38

39

Parameters:

40

- adata (AnnData): Annotated data object

41

- start_cell (str, optional): Starting cell for trajectory

42

- num_waypoints (int): Number of waypoints for trajectory

43

- terminal_states (list, optional): Terminal cell states

44

- copy (bool): Return copy

45

- **kwargs: Additional Palantir parameters

46

47

Returns:

48

AnnData or None: Object with trajectory results (if copy=True)

49

"""

50

51

def palantir_results(adata, early_cell=None, ms_data=None, copy=False):

52

"""

53

Process Palantir trajectory inference results.

54

55

Parameters:

56

- adata (AnnData): Annotated data object with Palantir results

57

- early_cell (str, optional): Early cell identifier

58

- ms_data (AnnData, optional): Mass spectrometry data

59

- copy (bool): Return copy

60

61

Returns:

62

AnnData or None: Object with processed results (if copy=True)

63

"""

64

65

def phenograph(adata, clustering_algo='leiden', k=30, directed=False, prune=False, min_cluster_size=10, jaccard=True, primary_metric='euclidean', n_jobs=-1, q_tol=1e-3, louvain_time_limit=2000, nn_method='kdtree', copy=False, **kwargs):

66

"""

67

PhenoGraph clustering algorithm.

68

69

Parameters:

70

- adata (AnnData): Annotated data object

71

- clustering_algo (str): Clustering algorithm ('leiden' or 'louvain')

72

- k (int): Number of nearest neighbors

73

- directed (bool): Use directed graph

74

- prune (bool): Prune graph

75

- min_cluster_size (int): Minimum cluster size

76

- jaccard (bool): Use Jaccard coefficient

77

- primary_metric (str): Distance metric

78

- n_jobs (int): Number of parallel jobs

79

- q_tol (float): Quality tolerance for clustering

80

- louvain_time_limit (int): Time limit for Louvain algorithm

81

- nn_method (str): Nearest neighbor method

82

- copy (bool): Return copy

83

- **kwargs: Additional parameters

84

85

Returns:

86

AnnData or None: Object with clustering results (if copy=True)

87

"""

88

89

def trimap(adata, n_inliers=10, n_outliers=5, n_random=5, lr=1000.0, n_iters=400, copy=False, **kwargs):

90

"""

91

TriMap dimensionality reduction.

92

93

Parameters:

94

- adata (AnnData): Annotated data object

95

- n_inliers (int): Number of inlier points

96

- n_outliers (int): Number of outlier points

97

- n_random (int): Number of random triplets

98

- lr (float): Learning rate

99

- n_iters (int): Number of iterations

100

- copy (bool): Return copy

101

- **kwargs: Additional TriMap parameters

102

103

Returns:

104

AnnData or None: Object with TriMap embedding (if copy=True)

105

"""

106

107

def wishbone(adata, start_cell=None, copy=False, **kwargs):

108

"""

109

Wishbone trajectory inference algorithm.

110

111

Parameters:

112

- adata (AnnData): Annotated data object

113

- start_cell (str, optional): Starting cell for trajectory

114

- copy (bool): Return copy

115

- **kwargs: Additional Wishbone parameters

116

117

Returns:

118

AnnData or None: Object with trajectory results (if copy=True)

119

"""

120

121

def sam(adata, max_iter=10, num_norm_avg=50, k=20, distance='correlation', copy=False, **kwargs):

122

"""

123

SAM (Self-Assembling Manifolds) for iterative clustering.

124

125

Parameters:

126

- adata (AnnData): Annotated data object

127

- max_iter (int): Maximum number of iterations

128

- num_norm_avg (int): Number of averages for normalization

129

- k (int): Number of nearest neighbors

130

- distance (str): Distance metric

131

- copy (bool): Return copy

132

- **kwargs: Additional SAM parameters

133

134

Returns:

135

AnnData or None: Object with SAM results (if copy=True)

136

"""

137

138

def harmony_timeseries(adata_list, tp=None, copy=False, **kwargs):

139

"""

140

Harmony integration for time series data.

141

142

Parameters:

143

- adata_list (list): List of AnnData objects from different time points

144

- tp (list, optional): Time point labels

145

- copy (bool): Return copy

146

- **kwargs: Additional Harmony parameters

147

148

Returns:

149

AnnData or None: Integrated dataset (if copy=True)

150

"""

151

```

152

153

### Cell Cycle Analysis

154

155

Specialized tools for cell cycle phase analysis.

156

157

```python { .api }

158

def cyclone(adata, species='human', copy=False, **kwargs):

159

"""

160

Cyclone cell cycle phase assignment.

161

162

Parameters:

163

- adata (AnnData): Annotated data object

164

- species (str): Species for marker genes ('human' or 'mouse')

165

- copy (bool): Return copy

166

- **kwargs: Additional parameters

167

168

Returns:

169

AnnData or None: Object with cell cycle scores (if copy=True)

170

"""

171

172

def sandbag(adata, fraction=0.5, copy=False, **kwargs):

173

"""

174

Sandbag cell cycle gene identification.

175

176

Parameters:

177

- adata (AnnData): Annotated data object

178

- fraction (float): Fraction threshold for gene selection

179

- copy (bool): Return copy

180

- **kwargs: Additional parameters

181

182

Returns:

183

AnnData or None: Object with cell cycle gene markers (if copy=True)

184

"""

185

```

186

187

### External Preprocessing

188

189

Batch correction and integration methods from external packages.

190

191

```python { .api }

192

def bbknn(adata, batch_key='batch', neighbors_within_batch=3, n_pcs=50, trim=None, copy=False, **kwargs):

193

"""

194

BBKNN (Batch Balanced k-Nearest Neighbors) batch correction.

195

196

Parameters:

197

- adata (AnnData): Annotated data object

198

- batch_key (str): Key in obs containing batch information

199

- neighbors_within_batch (int): Neighbors within each batch

200

- n_pcs (int): Number of principal components to use

201

- trim (int, optional): Trim neighbors per batch

202

- copy (bool): Return copy

203

- **kwargs: Additional BBKNN parameters

204

205

Returns:

206

AnnData or None: Object with corrected neighborhood graph (if copy=True)

207

"""

208

209

def dca(adata, mode='denoise', ae_type='nb-conddisp', normalize_per_cell=True, scale=True, log1p=True, hidden_size=(64, 32, 64), hidden_dropout=0.0, batchnorm=True, activation='relu', init='glorot_uniform', network_kwds={}, epochs=300, reduce_lr=10, early_stop=15, batch_size=32, optimizer='rmsprop', learning_rate=None, random_state=0, threads=None, verbose=False, training_kwds={}, return_model=False, return_info=False, copy=False):

210

"""

211

Deep Count Autoencoder (DCA) for denoising and batch correction.

212

213

Parameters:

214

- adata (AnnData): Annotated data object

215

- mode (str): Mode of operation ('denoise', 'latent')

216

- ae_type (str): Autoencoder type

217

- normalize_per_cell (bool): Normalize per cell

218

- scale (bool): Scale features

219

- log1p (bool): Log transform

220

- hidden_size (tuple): Hidden layer sizes

221

- hidden_dropout (float): Dropout rate

222

- batchnorm (bool): Use batch normalization

223

- activation (str): Activation function

224

- init (str): Weight initialization

225

- network_kwds (dict): Additional network parameters

226

- epochs (int): Number of training epochs

227

- reduce_lr (int): Learning rate reduction patience

228

- early_stop (int): Early stopping patience

229

- batch_size (int): Training batch size

230

- optimizer (str): Optimizer

231

- learning_rate (float, optional): Learning rate

232

- random_state (int): Random seed

233

- threads (int, optional): Number of threads

234

- verbose (bool): Verbose output

235

- training_kwds (dict): Additional training parameters

236

- return_model (bool): Return trained model

237

- return_info (bool): Return training information

238

- copy (bool): Return copy

239

240

Returns:

241

AnnData or tuple: Denoised data and optionally model/info

242

"""

243

244

def harmony_integrate(adata, key, basis='X_pca', adjusted_basis='X_pca_harmony', copy=False, **kwargs):

245

"""

246

Harmony batch integration.

247

248

Parameters:

249

- adata (AnnData): Annotated data object

250

- key (str): Key in obs for batch variable

251

- basis (str): Basis to integrate

252

- adjusted_basis (str): Key for integrated embedding

253

- copy (bool): Return copy

254

- **kwargs: Additional Harmony parameters

255

256

Returns:

257

AnnData or None: Object with integrated embedding (if copy=True)

258

"""

259

260

def hashsolo(adata, priors=[0.01, 0.8, 0.19], pre_existing_clusters=None, number_of_noise_barcodes=None, copy=False, **kwargs):

261

"""

262

HashSolo for demultiplexing cell hashing data and doublet detection.

263

264

Parameters:

265

- adata (AnnData): Annotated data object with hashtag data

266

- priors (list): Prior probabilities [doublet, negative, singlet]

267

- pre_existing_clusters (str, optional): Key for existing clusters

268

- number_of_noise_barcodes (int, optional): Number of noise barcodes

269

- copy (bool): Return copy

270

- **kwargs: Additional HashSolo parameters

271

272

Returns:

273

AnnData or None: Object with demultiplexing results (if copy=True)

274

"""

275

276

def magic(adata, name_list=None, knn=10, decay=1, knn_max=None, t=3, n_pca=20, solver='exact', knn_dist='euclidean', random_state=None, n_jobs=None, copy=False, **kwargs):

277

"""

278

MAGIC (Markov Affinity-based Graph Imputation of Cells) imputation.

279

280

Parameters:

281

- adata (AnnData): Annotated data object

282

- name_list (list, optional): Genes to impute (None for all)

283

- knn (int): Number of nearest neighbors

284

- decay (int): Alpha decay parameter

285

- knn_max (int, optional): Maximum number of neighbors

286

- t (int): Number of diffusion steps

287

- n_pca (int): Number of PCA components

288

- solver (str): Solver for eigenvalue decomposition

289

- knn_dist (str): Distance metric for KNN

290

- random_state (int, optional): Random seed

291

- n_jobs (int, optional): Number of parallel jobs

292

- copy (bool): Return copy

293

- **kwargs: Additional MAGIC parameters

294

295

Returns:

296

AnnData or None: Object with imputed data (if copy=True)

297

"""

298

299

def mnn_correct(adata_list, var_subset=None, batch_key='batch', index_unique='-', batch_categories=None, k=20, sigma=0.1, cos_norm_in=True, cos_norm_out=True, svd_dim=0, var_adj=True, compute_angle=False, mnn_order=None, svd_mode='rsvd', do_concatenate=True, save_raw=False, n_jobs=None, **kwargs):

300

"""

301

MNN (Mutual Nearest Neighbors) batch correction.

302

303

Parameters:

304

- adata_list (list): List of AnnData objects to correct

305

- var_subset (list, optional): Subset of variables for correction

306

- batch_key (str): Key for batch information

307

- index_unique (str): Separator for making indices unique

308

- batch_categories (list, optional): Batch category order

309

- k (int): Number of nearest neighbors

310

- sigma (float): Gaussian smoothing parameter

311

- cos_norm_in (bool): Cosine normalization before correction

312

- cos_norm_out (bool): Cosine normalization after correction

313

- svd_dim (int): Number of SVD dimensions (0 for no SVD)

314

- var_adj (bool): Adjust variance

315

- compute_angle (bool): Compute angle between batches

316

- mnn_order (list, optional): Order for MNN correction

317

- svd_mode (str): SVD computation mode

318

- do_concatenate (bool): Concatenate results

319

- save_raw (bool): Save uncorrected data

320

- n_jobs (int, optional): Number of parallel jobs

321

- **kwargs: Additional parameters

322

323

Returns:

324

AnnData or list: Corrected data

325

"""

326

327

def scanorama_integrate(adata_list, key=None, basis='X_pca', adjusted_basis='X_scanorama', copy=False, **kwargs):

328

"""

329

Scanorama integration for batch correction.

330

331

Parameters:

332

- adata_list (list): List of AnnData objects to integrate

333

- key (str, optional): Key for batch information

334

- basis (str): Basis for integration

335

- adjusted_basis (str): Key for integrated embedding

336

- copy (bool): Return copy

337

- **kwargs: Additional Scanorama parameters

338

339

Returns:

340

AnnData or list: Integrated datasets

341

"""

342

```

343

344

### Export Functions

345

346

Export scanpy results to other software platforms.

347

348

```python { .api }

349

def cellbrowser(adata, outdir, name, **kwargs):

350

"""

351

Export to UCSC Cell Browser format.

352

353

Parameters:

354

- adata (AnnData): Annotated data object

355

- outdir (str): Output directory

356

- name (str): Dataset name

357

- **kwargs: Additional export parameters

358

359

Returns:

360

None: Creates Cell Browser files

361

"""

362

363

def spring_project(adata, project_dir, **kwargs):

364

"""

365

Export to SPRING visualization tool.

366

367

Parameters:

368

- adata (AnnData): Annotated data object

369

- project_dir (str): Project directory

370

- **kwargs: Additional export parameters

371

372

Returns:

373

None: Creates SPRING project files

374

"""

375

```

376

377

## Usage Examples

378

379

### Dimensionality Reduction with PHATE

380

381

```python

382

import scanpy as sc

383

384

# PHATE embedding

385

sc.external.tl.phate(adata, n_components=2, knn=15, t=20)

386

sc.pl.embedding(adata, basis='X_phate', color='leiden')

387

388

# Compare with UMAP

389

fig, axes = plt.subplots(1, 2, figsize=(10, 4))

390

sc.pl.umap(adata, color='leiden', ax=axes[0], show=False, frameon=False)

391

sc.pl.embedding(adata, basis='X_phate', color='leiden', ax=axes[1], show=False, frameon=False)

392

axes[0].set_title('UMAP')

393

axes[1].set_title('PHATE')

394

plt.show()

395

```

396

397

### Trajectory Inference with Palantir

398

399

```python

400

# Set up for Palantir

401

sc.external.tl.palantir(adata, start_cell='ATGCCAGAACGACT-1')

402

403

# Plot pseudotime and branch probabilities

404

sc.pl.umap(adata, color=['palantir_pseudotime', 'palantir_entropy'])

405

406

# Plot differentiation potential

407

sc.pl.umap(adata, color='palantir_diff_potential')

408

```

409

410

### Batch Correction with Harmony

411

412

```python

413

# Harmony integration

414

sc.external.pp.harmony_integrate(adata, 'batch')

415

416

# Compare before and after

417

sc.pl.umap(adata, color='batch', title='Before Harmony')

418

sc.pl.embedding(adata, basis='X_pca_harmony', color='batch', title='After Harmony')

419

420

# Recompute neighbors on integrated data

421

sc.pp.neighbors(adata, use_rep='X_pca_harmony')

422

sc.tl.umap(adata)

423

```

424

425

### Imputation with MAGIC

426

427

```python

428

# MAGIC imputation for specific genes

429

genes_to_impute = ['CD34', 'GATA1', 'GATA2']

430

sc.external.pp.magic(adata, name_list=genes_to_impute, t=3)

431

432

# Compare before and after imputation

433

sc.pl.violin(adata, genes_to_impute, groupby='leiden',

434

use_raw=True, title='Before MAGIC')

435

sc.pl.violin(adata, genes_to_impute, groupby='leiden',

436

layer='MAGIC_imputed', title='After MAGIC')

437

```

438

439

### Cell Cycle Analysis

440

441

```python

442

# Cell cycle scoring with Cyclone

443

sc.external.tl.cyclone(adata, species='human')

444

445

# Plot cell cycle phases

446

sc.pl.umap(adata, color=['cyclone_G1', 'cyclone_S', 'cyclone_G2M'])

447

448

# Custom marker identification with Sandbag

449

sc.external.tl.sandbag(adata)

450

```

451

452

### Advanced Clustering with PhenoGraph

453

454

```python

455

# PhenoGraph clustering

456

sc.external.tl.phenograph(adata, k=30, clustering_algo='leiden')

457

458

# Compare with Leiden

459

sc.pl.umap(adata, color=['leiden', 'phenograph'], ncols=2)

460

```

461

462

### Batch Correction with BBKNN

463

464

```python

465

# BBKNN for batch-balanced neighbors

466

sc.external.pp.bbknn(adata, batch_key='batch', n_pcs=50)

467

468

# Recompute UMAP with corrected neighbors

469

sc.tl.umap(adata)

470

sc.pl.umap(adata, color='batch')

471

```

472

473

### Export to Other Tools

474

475

```python

476

# Export to UCSC Cell Browser

477

sc.external.exporting.cellbrowser(

478

adata,

479

outdir='cellbrowser_output',

480

name='my_dataset'

481

)

482

483

# Export to SPRING

484

sc.external.exporting.spring_project(

485

adata,

486

project_dir='spring_output'

487

)

488

```

489

490

## Integration Notes

491

492

### Installation Requirements

493

494

Many external tools require additional dependencies:

495

496

```bash

497

# For PHATE

498

pip install phate

499

500

# For Palantir

501

pip install palantir-sc

502

503

# For Harmony

504

pip install harmonypy

505

506

# For MAGIC

507

pip install magic-impute

508

509

# For BBKNN

510

pip install bbknn

511

512

# For DCA

513

pip install dca

514

```

515

516

### Memory and Performance

517

518

- External tools may have different memory requirements

519

- Some tools (like DCA) require GPU support for optimal performance

520

- Consider data size when choosing parameters

521

- Many tools support parallel processing via `n_jobs` parameter

522

523

### Reproducibility

524

525

- Set random seeds for reproducible results

526

- External tool versions may affect results

527

- Document tool versions used in analysis

528

- Some tools may not be fully deterministic