0
# External Tool Integration
1
2
Scanpy's external module provides integration with popular external single-cell analysis tools and methods through a unified interface. This extends scanpy's capabilities with specialized algorithms for dimensionality reduction, trajectory inference, batch correction, imputation, and more.
3
4
## Capabilities
5
6
### External Analysis Tools
7
8
Advanced analysis methods from specialized single-cell packages.
9
10
```python { .api }
11
def phate(adata, n_components=2, knn=5, decay=40, n_landmark=2000, t='auto', gamma=1, n_pca=100, solver='exact', seed=None, n_jobs=1, random_state=None, copy=False, **kwargs):
12
"""
13
PHATE (Potential of Heat-diffusion for Affinity-based Embedding) dimensionality reduction.
14
15
Parameters:
16
- adata (AnnData): Annotated data object
17
- n_components (int): Number of dimensions for embedding
18
- knn (int): Number of nearest neighbors
19
- decay (int): Alpha decay parameter
20
- n_landmark (int): Number of landmark points
21
- t (str or int): Time parameter for diffusion
22
- gamma (float): Informational distance parameter
23
- n_pca (int): Number of PCA components for preprocessing
24
- solver (str): Solver for eigenvalue decomposition
25
- seed (int, optional): Random seed
26
- n_jobs (int): Number of parallel jobs
27
- random_state (int, optional): Random state
28
- copy (bool): Return copy
29
- **kwargs: Additional PHATE parameters
30
31
Returns:
32
AnnData or None: Object with PHATE embedding (if copy=True)
33
"""
34
35
def palantir(adata, start_cell=None, num_waypoints=1200, terminal_states=None, copy=False, **kwargs):
36
"""
37
Palantir trajectory inference algorithm.
38
39
Parameters:
40
- adata (AnnData): Annotated data object
41
- start_cell (str, optional): Starting cell for trajectory
42
- num_waypoints (int): Number of waypoints for trajectory
43
- terminal_states (list, optional): Terminal cell states
44
- copy (bool): Return copy
45
- **kwargs: Additional Palantir parameters
46
47
Returns:
48
AnnData or None: Object with trajectory results (if copy=True)
49
"""
50
51
def palantir_results(adata, early_cell=None, ms_data=None, copy=False):
52
"""
53
Process Palantir trajectory inference results.
54
55
Parameters:
56
- adata (AnnData): Annotated data object with Palantir results
57
- early_cell (str, optional): Early cell identifier
58
- ms_data (AnnData, optional): Mass spectrometry data
59
- copy (bool): Return copy
60
61
Returns:
62
AnnData or None: Object with processed results (if copy=True)
63
"""
64
65
def phenograph(adata, clustering_algo='leiden', k=30, directed=False, prune=False, min_cluster_size=10, jaccard=True, primary_metric='euclidean', n_jobs=-1, q_tol=1e-3, louvain_time_limit=2000, nn_method='kdtree', copy=False, **kwargs):
66
"""
67
PhenoGraph clustering algorithm.
68
69
Parameters:
70
- adata (AnnData): Annotated data object
71
- clustering_algo (str): Clustering algorithm ('leiden' or 'louvain')
72
- k (int): Number of nearest neighbors
73
- directed (bool): Use directed graph
74
- prune (bool): Prune graph
75
- min_cluster_size (int): Minimum cluster size
76
- jaccard (bool): Use Jaccard coefficient
77
- primary_metric (str): Distance metric
78
- n_jobs (int): Number of parallel jobs
79
- q_tol (float): Quality tolerance for clustering
80
- louvain_time_limit (int): Time limit for Louvain algorithm
81
- nn_method (str): Nearest neighbor method
82
- copy (bool): Return copy
83
- **kwargs: Additional parameters
84
85
Returns:
86
AnnData or None: Object with clustering results (if copy=True)
87
"""
88
89
def trimap(adata, n_inliers=10, n_outliers=5, n_random=5, lr=1000.0, n_iters=400, copy=False, **kwargs):
90
"""
91
TriMap dimensionality reduction.
92
93
Parameters:
94
- adata (AnnData): Annotated data object
95
- n_inliers (int): Number of inlier points
96
- n_outliers (int): Number of outlier points
97
- n_random (int): Number of random triplets
98
- lr (float): Learning rate
99
- n_iters (int): Number of iterations
100
- copy (bool): Return copy
101
- **kwargs: Additional TriMap parameters
102
103
Returns:
104
AnnData or None: Object with TriMap embedding (if copy=True)
105
"""
106
107
def wishbone(adata, start_cell=None, copy=False, **kwargs):
108
"""
109
Wishbone trajectory inference algorithm.
110
111
Parameters:
112
- adata (AnnData): Annotated data object
113
- start_cell (str, optional): Starting cell for trajectory
114
- copy (bool): Return copy
115
- **kwargs: Additional Wishbone parameters
116
117
Returns:
118
AnnData or None: Object with trajectory results (if copy=True)
119
"""
120
121
def sam(adata, max_iter=10, num_norm_avg=50, k=20, distance='correlation', copy=False, **kwargs):
122
"""
123
SAM (Self-Assembling Manifolds) for iterative clustering.
124
125
Parameters:
126
- adata (AnnData): Annotated data object
127
- max_iter (int): Maximum number of iterations
128
- num_norm_avg (int): Number of averages for normalization
129
- k (int): Number of nearest neighbors
130
- distance (str): Distance metric
131
- copy (bool): Return copy
132
- **kwargs: Additional SAM parameters
133
134
Returns:
135
AnnData or None: Object with SAM results (if copy=True)
136
"""
137
138
def harmony_timeseries(adata_list, tp=None, copy=False, **kwargs):
139
"""
140
Harmony integration for time series data.
141
142
Parameters:
143
- adata_list (list): List of AnnData objects from different time points
144
- tp (list, optional): Time point labels
145
- copy (bool): Return copy
146
- **kwargs: Additional Harmony parameters
147
148
Returns:
149
AnnData or None: Integrated dataset (if copy=True)
150
"""
151
```
152
153
### Cell Cycle Analysis
154
155
Specialized tools for cell cycle phase analysis.
156
157
```python { .api }
158
def cyclone(adata, species='human', copy=False, **kwargs):
159
"""
160
Cyclone cell cycle phase assignment.
161
162
Parameters:
163
- adata (AnnData): Annotated data object
164
- species (str): Species for marker genes ('human' or 'mouse')
165
- copy (bool): Return copy
166
- **kwargs: Additional parameters
167
168
Returns:
169
AnnData or None: Object with cell cycle scores (if copy=True)
170
"""
171
172
def sandbag(adata, fraction=0.5, copy=False, **kwargs):
173
"""
174
Sandbag cell cycle gene identification.
175
176
Parameters:
177
- adata (AnnData): Annotated data object
178
- fraction (float): Fraction threshold for gene selection
179
- copy (bool): Return copy
180
- **kwargs: Additional parameters
181
182
Returns:
183
AnnData or None: Object with cell cycle gene markers (if copy=True)
184
"""
185
```
186
187
### External Preprocessing
188
189
Batch correction and integration methods from external packages.
190
191
```python { .api }
192
def bbknn(adata, batch_key='batch', neighbors_within_batch=3, n_pcs=50, trim=None, copy=False, **kwargs):
193
"""
194
BBKNN (Batch Balanced k-Nearest Neighbors) batch correction.
195
196
Parameters:
197
- adata (AnnData): Annotated data object
198
- batch_key (str): Key in obs containing batch information
199
- neighbors_within_batch (int): Neighbors within each batch
200
- n_pcs (int): Number of principal components to use
201
- trim (int, optional): Trim neighbors per batch
202
- copy (bool): Return copy
203
- **kwargs: Additional BBKNN parameters
204
205
Returns:
206
AnnData or None: Object with corrected neighborhood graph (if copy=True)
207
"""
208
209
def dca(adata, mode='denoise', ae_type='nb-conddisp', normalize_per_cell=True, scale=True, log1p=True, hidden_size=(64, 32, 64), hidden_dropout=0.0, batchnorm=True, activation='relu', init='glorot_uniform', network_kwds={}, epochs=300, reduce_lr=10, early_stop=15, batch_size=32, optimizer='rmsprop', learning_rate=None, random_state=0, threads=None, verbose=False, training_kwds={}, return_model=False, return_info=False, copy=False):
210
"""
211
Deep Count Autoencoder (DCA) for denoising and batch correction.
212
213
Parameters:
214
- adata (AnnData): Annotated data object
215
- mode (str): Mode of operation ('denoise', 'latent')
216
- ae_type (str): Autoencoder type
217
- normalize_per_cell (bool): Normalize per cell
218
- scale (bool): Scale features
219
- log1p (bool): Log transform
220
- hidden_size (tuple): Hidden layer sizes
221
- hidden_dropout (float): Dropout rate
222
- batchnorm (bool): Use batch normalization
223
- activation (str): Activation function
224
- init (str): Weight initialization
225
- network_kwds (dict): Additional network parameters
226
- epochs (int): Number of training epochs
227
- reduce_lr (int): Learning rate reduction patience
228
- early_stop (int): Early stopping patience
229
- batch_size (int): Training batch size
230
- optimizer (str): Optimizer
231
- learning_rate (float, optional): Learning rate
232
- random_state (int): Random seed
233
- threads (int, optional): Number of threads
234
- verbose (bool): Verbose output
235
- training_kwds (dict): Additional training parameters
236
- return_model (bool): Return trained model
237
- return_info (bool): Return training information
238
- copy (bool): Return copy
239
240
Returns:
241
AnnData or tuple: Denoised data and optionally model/info
242
"""
243
244
def harmony_integrate(adata, key, basis='X_pca', adjusted_basis='X_pca_harmony', copy=False, **kwargs):
245
"""
246
Harmony batch integration.
247
248
Parameters:
249
- adata (AnnData): Annotated data object
250
- key (str): Key in obs for batch variable
251
- basis (str): Basis to integrate
252
- adjusted_basis (str): Key for integrated embedding
253
- copy (bool): Return copy
254
- **kwargs: Additional Harmony parameters
255
256
Returns:
257
AnnData or None: Object with integrated embedding (if copy=True)
258
"""
259
260
def hashsolo(adata, priors=[0.01, 0.8, 0.19], pre_existing_clusters=None, number_of_noise_barcodes=None, copy=False, **kwargs):
261
"""
262
HashSolo for demultiplexing cell hashing data and doublet detection.
263
264
Parameters:
265
- adata (AnnData): Annotated data object with hashtag data
266
- priors (list): Prior probabilities [doublet, negative, singlet]
267
- pre_existing_clusters (str, optional): Key for existing clusters
268
- number_of_noise_barcodes (int, optional): Number of noise barcodes
269
- copy (bool): Return copy
270
- **kwargs: Additional HashSolo parameters
271
272
Returns:
273
AnnData or None: Object with demultiplexing results (if copy=True)
274
"""
275
276
def magic(adata, name_list=None, knn=10, decay=1, knn_max=None, t=3, n_pca=20, solver='exact', knn_dist='euclidean', random_state=None, n_jobs=None, copy=False, **kwargs):
277
"""
278
MAGIC (Markov Affinity-based Graph Imputation of Cells) imputation.
279
280
Parameters:
281
- adata (AnnData): Annotated data object
282
- name_list (list, optional): Genes to impute (None for all)
283
- knn (int): Number of nearest neighbors
284
- decay (int): Alpha decay parameter
285
- knn_max (int, optional): Maximum number of neighbors
286
- t (int): Number of diffusion steps
287
- n_pca (int): Number of PCA components
288
- solver (str): Solver for eigenvalue decomposition
289
- knn_dist (str): Distance metric for KNN
290
- random_state (int, optional): Random seed
291
- n_jobs (int, optional): Number of parallel jobs
292
- copy (bool): Return copy
293
- **kwargs: Additional MAGIC parameters
294
295
Returns:
296
AnnData or None: Object with imputed data (if copy=True)
297
"""
298
299
def mnn_correct(adata_list, var_subset=None, batch_key='batch', index_unique='-', batch_categories=None, k=20, sigma=0.1, cos_norm_in=True, cos_norm_out=True, svd_dim=0, var_adj=True, compute_angle=False, mnn_order=None, svd_mode='rsvd', do_concatenate=True, save_raw=False, n_jobs=None, **kwargs):
300
"""
301
MNN (Mutual Nearest Neighbors) batch correction.
302
303
Parameters:
304
- adata_list (list): List of AnnData objects to correct
305
- var_subset (list, optional): Subset of variables for correction
306
- batch_key (str): Key for batch information
307
- index_unique (str): Separator for making indices unique
308
- batch_categories (list, optional): Batch category order
309
- k (int): Number of nearest neighbors
310
- sigma (float): Gaussian smoothing parameter
311
- cos_norm_in (bool): Cosine normalization before correction
312
- cos_norm_out (bool): Cosine normalization after correction
313
- svd_dim (int): Number of SVD dimensions (0 for no SVD)
314
- var_adj (bool): Adjust variance
315
- compute_angle (bool): Compute angle between batches
316
- mnn_order (list, optional): Order for MNN correction
317
- svd_mode (str): SVD computation mode
318
- do_concatenate (bool): Concatenate results
319
- save_raw (bool): Save uncorrected data
320
- n_jobs (int, optional): Number of parallel jobs
321
- **kwargs: Additional parameters
322
323
Returns:
324
AnnData or list: Corrected data
325
"""
326
327
def scanorama_integrate(adata_list, key=None, basis='X_pca', adjusted_basis='X_scanorama', copy=False, **kwargs):
328
"""
329
Scanorama integration for batch correction.
330
331
Parameters:
332
- adata_list (list): List of AnnData objects to integrate
333
- key (str, optional): Key for batch information
334
- basis (str): Basis for integration
335
- adjusted_basis (str): Key for integrated embedding
336
- copy (bool): Return copy
337
- **kwargs: Additional Scanorama parameters
338
339
Returns:
340
AnnData or list: Integrated datasets
341
"""
342
```
343
344
### Export Functions
345
346
Export scanpy results to other software platforms.
347
348
```python { .api }
349
def cellbrowser(adata, outdir, name, **kwargs):
350
"""
351
Export to UCSC Cell Browser format.
352
353
Parameters:
354
- adata (AnnData): Annotated data object
355
- outdir (str): Output directory
356
- name (str): Dataset name
357
- **kwargs: Additional export parameters
358
359
Returns:
360
None: Creates Cell Browser files
361
"""
362
363
def spring_project(adata, project_dir, **kwargs):
364
"""
365
Export to SPRING visualization tool.
366
367
Parameters:
368
- adata (AnnData): Annotated data object
369
- project_dir (str): Project directory
370
- **kwargs: Additional export parameters
371
372
Returns:
373
None: Creates SPRING project files
374
"""
375
```
376
377
## Usage Examples
378
379
### Dimensionality Reduction with PHATE
380
381
```python
382
import scanpy as sc
383
384
# PHATE embedding
385
sc.external.tl.phate(adata, n_components=2, knn=15, t=20)
386
sc.pl.embedding(adata, basis='X_phate', color='leiden')
387
388
# Compare with UMAP
389
fig, axes = plt.subplots(1, 2, figsize=(10, 4))
390
sc.pl.umap(adata, color='leiden', ax=axes[0], show=False, frameon=False)
391
sc.pl.embedding(adata, basis='X_phate', color='leiden', ax=axes[1], show=False, frameon=False)
392
axes[0].set_title('UMAP')
393
axes[1].set_title('PHATE')
394
plt.show()
395
```
396
397
### Trajectory Inference with Palantir
398
399
```python
400
# Set up for Palantir
401
sc.external.tl.palantir(adata, start_cell='ATGCCAGAACGACT-1')
402
403
# Plot pseudotime and branch probabilities
404
sc.pl.umap(adata, color=['palantir_pseudotime', 'palantir_entropy'])
405
406
# Plot differentiation potential
407
sc.pl.umap(adata, color='palantir_diff_potential')
408
```
409
410
### Batch Correction with Harmony
411
412
```python
413
# Harmony integration
414
sc.external.pp.harmony_integrate(adata, 'batch')
415
416
# Compare before and after
417
sc.pl.umap(adata, color='batch', title='Before Harmony')
418
sc.pl.embedding(adata, basis='X_pca_harmony', color='batch', title='After Harmony')
419
420
# Recompute neighbors on integrated data
421
sc.pp.neighbors(adata, use_rep='X_pca_harmony')
422
sc.tl.umap(adata)
423
```
424
425
### Imputation with MAGIC
426
427
```python
428
# MAGIC imputation for specific genes
429
genes_to_impute = ['CD34', 'GATA1', 'GATA2']
430
sc.external.pp.magic(adata, name_list=genes_to_impute, t=3)
431
432
# Compare before and after imputation
433
sc.pl.violin(adata, genes_to_impute, groupby='leiden',
434
use_raw=True, title='Before MAGIC')
435
sc.pl.violin(adata, genes_to_impute, groupby='leiden',
436
layer='MAGIC_imputed', title='After MAGIC')
437
```
438
439
### Cell Cycle Analysis
440
441
```python
442
# Cell cycle scoring with Cyclone
443
sc.external.tl.cyclone(adata, species='human')
444
445
# Plot cell cycle phases
446
sc.pl.umap(adata, color=['cyclone_G1', 'cyclone_S', 'cyclone_G2M'])
447
448
# Custom marker identification with Sandbag
449
sc.external.tl.sandbag(adata)
450
```
451
452
### Advanced Clustering with PhenoGraph
453
454
```python
455
# PhenoGraph clustering
456
sc.external.tl.phenograph(adata, k=30, clustering_algo='leiden')
457
458
# Compare with Leiden
459
sc.pl.umap(adata, color=['leiden', 'phenograph'], ncols=2)
460
```
461
462
### Batch Correction with BBKNN
463
464
```python
465
# BBKNN for batch-balanced neighbors
466
sc.external.pp.bbknn(adata, batch_key='batch', n_pcs=50)
467
468
# Recompute UMAP with corrected neighbors
469
sc.tl.umap(adata)
470
sc.pl.umap(adata, color='batch')
471
```
472
473
### Export to Other Tools
474
475
```python
476
# Export to UCSC Cell Browser
477
sc.external.exporting.cellbrowser(
478
adata,
479
outdir='cellbrowser_output',
480
name='my_dataset'
481
)
482
483
# Export to SPRING
484
sc.external.exporting.spring_project(
485
adata,
486
project_dir='spring_output'
487
)
488
```
489
490
## Integration Notes
491
492
### Installation Requirements
493
494
Many external tools require additional dependencies:
495
496
```bash
497
# For PHATE
498
pip install phate
499
500
# For Palantir
501
pip install palantir-sc
502
503
# For Harmony
504
pip install harmonypy
505
506
# For MAGIC
507
pip install magic-impute
508
509
# For BBKNN
510
pip install bbknn
511
512
# For DCA
513
pip install dca
514
```
515
516
### Memory and Performance
517
518
- External tools may have different memory requirements
519
- Some tools (like DCA) require GPU support for optimal performance
520
- Consider data size when choosing parameters
521
- Many tools support parallel processing via `n_jobs` parameter
522
523
### Reproducibility
524
525
- Set random seeds for reproducible results
526
- External tool versions may affect results
527
- Document tool versions used in analysis
528
- Some tools may not be fully deterministic