Tessl Tile for pypi/scanpy@1.11.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

analysis-tools.md data-io.md datasets.md external-tools.md index.md preprocessing.md queries.md spatial-analysis.md utilities.md visualization.md

external-tools.mddocs/

0
# External Tool Integration
1

2
Scanpy's external module provides integration with popular external single-cell analysis tools and methods through a unified interface. This extends scanpy's capabilities with specialized algorithms for dimensionality reduction, trajectory inference, batch correction, imputation, and more.
3

4
## Capabilities
5

6
### External Analysis Tools
7

8
Advanced analysis methods from specialized single-cell packages.
9

10
```python { .api }
11
def phate(adata, n_components=2, knn=5, decay=40, n_landmark=2000, t='auto', gamma=1, n_pca=100, solver='exact', seed=None, n_jobs=1, random_state=None, copy=False, **kwargs):
12
    """
13
    PHATE (Potential of Heat-diffusion for Affinity-based Embedding) dimensionality reduction.
14
    
15
    Parameters:
16
    - adata (AnnData): Annotated data object
17
    - n_components (int): Number of dimensions for embedding
18
    - knn (int): Number of nearest neighbors
19
    - decay (int): Alpha decay parameter
20
    - n_landmark (int): Number of landmark points
21
    - t (str or int): Time parameter for diffusion
22
    - gamma (float): Informational distance parameter
23
    - n_pca (int): Number of PCA components for preprocessing
24
    - solver (str): Solver for eigenvalue decomposition
25
    - seed (int, optional): Random seed
26
    - n_jobs (int): Number of parallel jobs
27
    - random_state (int, optional): Random state
28
    - copy (bool): Return copy
29
    - **kwargs: Additional PHATE parameters
30
    
31
    Returns:
32
    AnnData or None: Object with PHATE embedding (if copy=True)
33
    """
34

35
def palantir(adata, start_cell=None, num_waypoints=1200, terminal_states=None, copy=False, **kwargs):
36
    """
37
    Palantir trajectory inference algorithm.
38
    
39
    Parameters:
40
    - adata (AnnData): Annotated data object
41
    - start_cell (str, optional): Starting cell for trajectory
42
    - num_waypoints (int): Number of waypoints for trajectory
43
    - terminal_states (list, optional): Terminal cell states
44
    - copy (bool): Return copy
45
    - **kwargs: Additional Palantir parameters
46
    
47
    Returns:
48
    AnnData or None: Object with trajectory results (if copy=True)
49
    """
50

51
def palantir_results(adata, early_cell=None, ms_data=None, copy=False):
52
    """
53
    Process Palantir trajectory inference results.
54
    
55
    Parameters:
56
    - adata (AnnData): Annotated data object with Palantir results
57
    - early_cell (str, optional): Early cell identifier
58
    - ms_data (AnnData, optional): Mass spectrometry data
59
    - copy (bool): Return copy
60
    
61
    Returns:
62
    AnnData or None: Object with processed results (if copy=True)
63
    """
64

65
def phenograph(adata, clustering_algo='leiden', k=30, directed=False, prune=False, min_cluster_size=10, jaccard=True, primary_metric='euclidean', n_jobs=-1, q_tol=1e-3, louvain_time_limit=2000, nn_method='kdtree', copy=False, **kwargs):
66
    """
67
    PhenoGraph clustering algorithm.
68
    
69
    Parameters:
70
    - adata (AnnData): Annotated data object
71
    - clustering_algo (str): Clustering algorithm ('leiden' or 'louvain')
72
    - k (int): Number of nearest neighbors
73
    - directed (bool): Use directed graph
74
    - prune (bool): Prune graph
75
    - min_cluster_size (int): Minimum cluster size
76
    - jaccard (bool): Use Jaccard coefficient
77
    - primary_metric (str): Distance metric
78
    - n_jobs (int): Number of parallel jobs
79
    - q_tol (float): Quality tolerance for clustering
80
    - louvain_time_limit (int): Time limit for Louvain algorithm
81
    - nn_method (str): Nearest neighbor method
82
    - copy (bool): Return copy
83
    - **kwargs: Additional parameters
84
    
85
    Returns:
86
    AnnData or None: Object with clustering results (if copy=True)
87
    """
88

89
def trimap(adata, n_inliers=10, n_outliers=5, n_random=5, lr=1000.0, n_iters=400, copy=False, **kwargs):
90
    """
91
    TriMap dimensionality reduction.
92
    
93
    Parameters:
94
    - adata (AnnData): Annotated data object
95
    - n_inliers (int): Number of inlier points
96
    - n_outliers (int): Number of outlier points  
97
    - n_random (int): Number of random triplets
98
    - lr (float): Learning rate
99
    - n_iters (int): Number of iterations
100
    - copy (bool): Return copy
101
    - **kwargs: Additional TriMap parameters
102
    
103
    Returns:
104
    AnnData or None: Object with TriMap embedding (if copy=True)
105
    """
106

107
def wishbone(adata, start_cell=None, copy=False, **kwargs):
108
    """
109
    Wishbone trajectory inference algorithm.
110
    
111
    Parameters:
112
    - adata (AnnData): Annotated data object
113
    - start_cell (str, optional): Starting cell for trajectory
114
    - copy (bool): Return copy
115
    - **kwargs: Additional Wishbone parameters
116
    
117
    Returns:
118
    AnnData or None: Object with trajectory results (if copy=True)
119
    """
120

121
def sam(adata, max_iter=10, num_norm_avg=50, k=20, distance='correlation', copy=False, **kwargs):
122
    """
123
    SAM (Self-Assembling Manifolds) for iterative clustering.
124
    
125
    Parameters:
126
    - adata (AnnData): Annotated data object
127
    - max_iter (int): Maximum number of iterations
128
    - num_norm_avg (int): Number of averages for normalization
129
    - k (int): Number of nearest neighbors
130
    - distance (str): Distance metric
131
    - copy (bool): Return copy
132
    - **kwargs: Additional SAM parameters
133
    
134
    Returns:
135
    AnnData or None: Object with SAM results (if copy=True)
136
    """
137

138
def harmony_timeseries(adata_list, tp=None, copy=False, **kwargs):
139
    """
140
    Harmony integration for time series data.
141
    
142
    Parameters:
143
    - adata_list (list): List of AnnData objects from different time points
144
    - tp (list, optional): Time point labels
145
    - copy (bool): Return copy
146
    - **kwargs: Additional Harmony parameters
147
    
148
    Returns:
149
    AnnData or None: Integrated dataset (if copy=True)
150
    """
151
```
152

153
### Cell Cycle Analysis
154

155
Specialized tools for cell cycle phase analysis.
156

157
```python { .api }
158
def cyclone(adata, species='human', copy=False, **kwargs):
159
    """
160
    Cyclone cell cycle phase assignment.
161
    
162
    Parameters:
163
    - adata (AnnData): Annotated data object
164
    - species (str): Species for marker genes ('human' or 'mouse')
165
    - copy (bool): Return copy
166
    - **kwargs: Additional parameters
167
    
168
    Returns:
169
    AnnData or None: Object with cell cycle scores (if copy=True)
170
    """
171

172
def sandbag(adata, fraction=0.5, copy=False, **kwargs):
173
    """
174
    Sandbag cell cycle gene identification.
175
    
176
    Parameters:
177
    - adata (AnnData): Annotated data object
178
    - fraction (float): Fraction threshold for gene selection
179
    - copy (bool): Return copy
180
    - **kwargs: Additional parameters
181
    
182
    Returns:
183
    AnnData or None: Object with cell cycle gene markers (if copy=True)
184
    """
185
```
186

187
### External Preprocessing
188

189
Batch correction and integration methods from external packages.
190

191
```python { .api }
192
def bbknn(adata, batch_key='batch', neighbors_within_batch=3, n_pcs=50, trim=None, copy=False, **kwargs):
193
    """
194
    BBKNN (Batch Balanced k-Nearest Neighbors) batch correction.
195
    
196
    Parameters:
197
    - adata (AnnData): Annotated data object
198
    - batch_key (str): Key in obs containing batch information
199
    - neighbors_within_batch (int): Neighbors within each batch
200
    - n_pcs (int): Number of principal components to use
201
    - trim (int, optional): Trim neighbors per batch
202
    - copy (bool): Return copy
203
    - **kwargs: Additional BBKNN parameters
204
    
205
    Returns:
206
    AnnData or None: Object with corrected neighborhood graph (if copy=True)
207
    """
208

209
def dca(adata, mode='denoise', ae_type='nb-conddisp', normalize_per_cell=True, scale=True, log1p=True, hidden_size=(64, 32, 64), hidden_dropout=0.0, batchnorm=True, activation='relu', init='glorot_uniform', network_kwds={}, epochs=300, reduce_lr=10, early_stop=15, batch_size=32, optimizer='rmsprop', learning_rate=None, random_state=0, threads=None, verbose=False, training_kwds={}, return_model=False, return_info=False, copy=False):
210
    """
211
    Deep Count Autoencoder (DCA) for denoising and batch correction.
212
    
213
    Parameters:
214
    - adata (AnnData): Annotated data object
215
    - mode (str): Mode of operation ('denoise', 'latent')
216
    - ae_type (str): Autoencoder type
217
    - normalize_per_cell (bool): Normalize per cell
218
    - scale (bool): Scale features
219
    - log1p (bool): Log transform
220
    - hidden_size (tuple): Hidden layer sizes
221
    - hidden_dropout (float): Dropout rate
222
    - batchnorm (bool): Use batch normalization
223
    - activation (str): Activation function
224
    - init (str): Weight initialization
225
    - network_kwds (dict): Additional network parameters
226
    - epochs (int): Number of training epochs
227
    - reduce_lr (int): Learning rate reduction patience
228
    - early_stop (int): Early stopping patience
229
    - batch_size (int): Training batch size
230
    - optimizer (str): Optimizer
231
    - learning_rate (float, optional): Learning rate
232
    - random_state (int): Random seed
233
    - threads (int, optional): Number of threads
234
    - verbose (bool): Verbose output
235
    - training_kwds (dict): Additional training parameters
236
    - return_model (bool): Return trained model
237
    - return_info (bool): Return training information
238
    - copy (bool): Return copy
239
    
240
    Returns:
241
    AnnData or tuple: Denoised data and optionally model/info
242
    """
243

244
def harmony_integrate(adata, key, basis='X_pca', adjusted_basis='X_pca_harmony', copy=False, **kwargs):
245
    """
246
    Harmony batch integration.
247
    
248
    Parameters:
249
    - adata (AnnData): Annotated data object
250
    - key (str): Key in obs for batch variable
251
    - basis (str): Basis to integrate
252
    - adjusted_basis (str): Key for integrated embedding
253
    - copy (bool): Return copy
254
    - **kwargs: Additional Harmony parameters
255
    
256
    Returns:
257
    AnnData or None: Object with integrated embedding (if copy=True)
258
    """
259

260
def hashsolo(adata, priors=[0.01, 0.8, 0.19], pre_existing_clusters=None, number_of_noise_barcodes=None, copy=False, **kwargs):
261
    """
262
    HashSolo for demultiplexing cell hashing data and doublet detection.
263
    
264
    Parameters:
265
    - adata (AnnData): Annotated data object with hashtag data
266
    - priors (list): Prior probabilities [doublet, negative, singlet]
267
    - pre_existing_clusters (str, optional): Key for existing clusters
268
    - number_of_noise_barcodes (int, optional): Number of noise barcodes
269
    - copy (bool): Return copy
270
    - **kwargs: Additional HashSolo parameters
271
    
272
    Returns:
273
    AnnData or None: Object with demultiplexing results (if copy=True)
274
    """
275

276
def magic(adata, name_list=None, knn=10, decay=1, knn_max=None, t=3, n_pca=20, solver='exact', knn_dist='euclidean', random_state=None, n_jobs=None, copy=False, **kwargs):
277
    """
278
    MAGIC (Markov Affinity-based Graph Imputation of Cells) imputation.
279
    
280
    Parameters:
281
    - adata (AnnData): Annotated data object
282
    - name_list (list, optional): Genes to impute (None for all)
283
    - knn (int): Number of nearest neighbors
284
    - decay (int): Alpha decay parameter
285
    - knn_max (int, optional): Maximum number of neighbors
286
    - t (int): Number of diffusion steps
287
    - n_pca (int): Number of PCA components
288
    - solver (str): Solver for eigenvalue decomposition
289
    - knn_dist (str): Distance metric for KNN
290
    - random_state (int, optional): Random seed
291
    - n_jobs (int, optional): Number of parallel jobs
292
    - copy (bool): Return copy
293
    - **kwargs: Additional MAGIC parameters
294
    
295
    Returns:
296
    AnnData or None: Object with imputed data (if copy=True)
297
    """
298

299
def mnn_correct(adata_list, var_subset=None, batch_key='batch', index_unique='-', batch_categories=None, k=20, sigma=0.1, cos_norm_in=True, cos_norm_out=True, svd_dim=0, var_adj=True, compute_angle=False, mnn_order=None, svd_mode='rsvd', do_concatenate=True, save_raw=False, n_jobs=None, **kwargs):
300
    """
301
    MNN (Mutual Nearest Neighbors) batch correction.
302
    
303
    Parameters:
304
    - adata_list (list): List of AnnData objects to correct
305
    - var_subset (list, optional): Subset of variables for correction
306
    - batch_key (str): Key for batch information
307
    - index_unique (str): Separator for making indices unique
308
    - batch_categories (list, optional): Batch category order
309
    - k (int): Number of nearest neighbors
310
    - sigma (float): Gaussian smoothing parameter
311
    - cos_norm_in (bool): Cosine normalization before correction
312
    - cos_norm_out (bool): Cosine normalization after correction
313
    - svd_dim (int): Number of SVD dimensions (0 for no SVD)
314
    - var_adj (bool): Adjust variance
315
    - compute_angle (bool): Compute angle between batches
316
    - mnn_order (list, optional): Order for MNN correction
317
    - svd_mode (str): SVD computation mode
318
    - do_concatenate (bool): Concatenate results
319
    - save_raw (bool): Save uncorrected data
320
    - n_jobs (int, optional): Number of parallel jobs
321
    - **kwargs: Additional parameters
322
    
323
    Returns:
324
    AnnData or list: Corrected data
325
    """
326

327
def scanorama_integrate(adata_list, key=None, basis='X_pca', adjusted_basis='X_scanorama', copy=False, **kwargs):
328
    """
329
    Scanorama integration for batch correction.
330
    
331
    Parameters:
332
    - adata_list (list): List of AnnData objects to integrate
333
    - key (str, optional): Key for batch information
334
    - basis (str): Basis for integration
335
    - adjusted_basis (str): Key for integrated embedding
336
    - copy (bool): Return copy
337
    - **kwargs: Additional Scanorama parameters
338
    
339
    Returns:
340
    AnnData or list: Integrated datasets
341
    """
342
```
343

344
### Export Functions
345

346
Export scanpy results to other software platforms.
347

348
```python { .api }
349
def cellbrowser(adata, outdir, name, **kwargs):
350
    """
351
    Export to UCSC Cell Browser format.
352
    
353
    Parameters:
354
    - adata (AnnData): Annotated data object
355
    - outdir (str): Output directory
356
    - name (str): Dataset name
357
    - **kwargs: Additional export parameters
358
    
359
    Returns:
360
    None: Creates Cell Browser files
361
    """
362

363
def spring_project(adata, project_dir, **kwargs):
364
    """
365
    Export to SPRING visualization tool.
366
    
367
    Parameters:
368
    - adata (AnnData): Annotated data object
369
    - project_dir (str): Project directory
370
    - **kwargs: Additional export parameters
371
    
372
    Returns:
373
    None: Creates SPRING project files
374
    """
375
```
376

377
## Usage Examples
378

379
### Dimensionality Reduction with PHATE
380

381
```python
382
import scanpy as sc
383

384
# PHATE embedding
385
sc.external.tl.phate(adata, n_components=2, knn=15, t=20)
386
sc.pl.embedding(adata, basis='X_phate', color='leiden')
387

388
# Compare with UMAP
389
fig, axes = plt.subplots(1, 2, figsize=(10, 4))
390
sc.pl.umap(adata, color='leiden', ax=axes[0], show=False, frameon=False)
391
sc.pl.embedding(adata, basis='X_phate', color='leiden', ax=axes[1], show=False, frameon=False)
392
axes[0].set_title('UMAP')
393
axes[1].set_title('PHATE')
394
plt.show()
395
```
396

397
### Trajectory Inference with Palantir
398

399
```python
400
# Set up for Palantir
401
sc.external.tl.palantir(adata, start_cell='ATGCCAGAACGACT-1')
402

403
# Plot pseudotime and branch probabilities
404
sc.pl.umap(adata, color=['palantir_pseudotime', 'palantir_entropy'])
405

406
# Plot differentiation potential
407
sc.pl.umap(adata, color='palantir_diff_potential')
408
```
409

410
### Batch Correction with Harmony
411

412
```python
413
# Harmony integration
414
sc.external.pp.harmony_integrate(adata, 'batch')
415

416
# Compare before and after
417
sc.pl.umap(adata, color='batch', title='Before Harmony')
418
sc.pl.embedding(adata, basis='X_pca_harmony', color='batch', title='After Harmony')
419

420
# Recompute neighbors on integrated data
421
sc.pp.neighbors(adata, use_rep='X_pca_harmony')
422
sc.tl.umap(adata)
423
```
424

425
### Imputation with MAGIC
426

427
```python
428
# MAGIC imputation for specific genes
429
genes_to_impute = ['CD34', 'GATA1', 'GATA2']
430
sc.external.pp.magic(adata, name_list=genes_to_impute, t=3)
431

432
# Compare before and after imputation
433
sc.pl.violin(adata, genes_to_impute, groupby='leiden', 
434
             use_raw=True, title='Before MAGIC')
435
sc.pl.violin(adata, genes_to_impute, groupby='leiden', 
436
             layer='MAGIC_imputed', title='After MAGIC')
437
```
438

439
### Cell Cycle Analysis
440

441
```python
442
# Cell cycle scoring with Cyclone
443
sc.external.tl.cyclone(adata, species='human')
444

445
# Plot cell cycle phases
446
sc.pl.umap(adata, color=['cyclone_G1', 'cyclone_S', 'cyclone_G2M'])
447

448
# Custom marker identification with Sandbag
449
sc.external.tl.sandbag(adata)
450
```
451

452
### Advanced Clustering with PhenoGraph
453

454
```python
455
# PhenoGraph clustering
456
sc.external.tl.phenograph(adata, k=30, clustering_algo='leiden')
457

458
# Compare with Leiden
459
sc.pl.umap(adata, color=['leiden', 'phenograph'], ncols=2)
460
```
461

462
### Batch Correction with BBKNN
463

464
```python
465
# BBKNN for batch-balanced neighbors
466
sc.external.pp.bbknn(adata, batch_key='batch', n_pcs=50)
467

468
# Recompute UMAP with corrected neighbors
469
sc.tl.umap(adata)
470
sc.pl.umap(adata, color='batch')
471
```
472

473
### Export to Other Tools
474

475
```python
476
# Export to UCSC Cell Browser
477
sc.external.exporting.cellbrowser(
478
    adata, 
479
    outdir='cellbrowser_output',
480
    name='my_dataset'
481
)
482

483
# Export to SPRING
484
sc.external.exporting.spring_project(
485
    adata,
486
    project_dir='spring_output'
487
)
488
```
489

490
## Integration Notes
491

492
### Installation Requirements
493

494
Many external tools require additional dependencies:
495

496
```bash
497
# For PHATE
498
pip install phate
499

500
# For Palantir  
501
pip install palantir-sc
502

503
# For Harmony
504
pip install harmonypy
505

506
# For MAGIC
507
pip install magic-impute
508

509
# For BBKNN
510
pip install bbknn
511

512
# For DCA
513
pip install dca
514
```
515

516
### Memory and Performance
517

518
- External tools may have different memory requirements
519
- Some tools (like DCA) require GPU support for optimal performance
520
- Consider data size when choosing parameters
521
- Many tools support parallel processing via `n_jobs` parameter
522

523
### Reproducibility
524

525
- Set random seeds for reproducible results
526
- External tool versions may affect results
527
- Document tool versions used in analysis
528
- Some tools may not be fully deterministic

Version

Tile

Files

external-tools.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

external-tools.mddocs/