Tessl Tile for pypi/scanpy@1.11.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

analysis-tools.md data-io.md datasets.md external-tools.md index.md preprocessing.md queries.md spatial-analysis.md utilities.md visualization.md

index.mddocs/

0
# Scanpy
1

2
Scanpy is a comprehensive toolkit for analyzing single-cell gene expression data that provides a scalable Python-based implementation for datasets exceeding one million cells. Built jointly with anndata, it offers a complete workflow including preprocessing, visualization, clustering, trajectory inference, and differential expression testing specifically designed for single-cell genomics research. The library integrates seamlessly with the scientific Python ecosystem and includes advanced algorithms for dimensionality reduction, neighborhood graphs, clustering methods, and pseudotime analysis, making it an essential tool for computational biology researchers working with single-cell RNA sequencing data and other single-cell omics technologies.
3

4
## Package Information
5

6
- **Package Name**: scanpy
7
- **Language**: Python
8
- **Installation**: `pip install scanpy`
9

10
## Core Imports
11

12
```python
13
import scanpy as sc
14
```
15

16
Common additional imports for working with scanpy:
17

18
```python
19
import scanpy as sc
20
import anndata as ad
21
import pandas as pd
22
import numpy as np
23
```
24

25
## Basic Usage
26

27
```python
28
import scanpy as sc
29
import pandas as pd
30

31
# Settings
32
sc.settings.verbosity = 3  # verbosity level
33
sc.settings.set_figure_params(dpi=80, facecolor='white')
34

35
# Load data (10x Genomics format)
36
adata = sc.read_10x_mtx(
37
    'data/filtered_gene_bc_matrices/hg19/',  # the directory with the .mtx file
38
    var_names='gene_symbols',  # use gene symbols for gene names (variables names)
39
    cache=True  # write a cache file for faster subsequent reading
40
)
41

42
# Basic preprocessing
43
sc.pp.filter_cells(adata, min_genes=200)  # filter out cells expressing < 200 genes
44
sc.pp.filter_genes(adata, min_cells=3)   # filter out genes expressed in < 3 cells
45

46
# Calculate QC metrics
47
adata.var['mt'] = adata.var_names.str.startswith('MT-')  # mitochondrial genes
48
sc.pp.calculate_qc_metrics(adata, percent_top=None, log1p=False, inplace=True)
49

50
# Normalization and scaling
51
sc.pp.normalize_total(adata, target_sum=1e4)  # normalize every cell to 10,000 UMI
52
sc.pp.log1p(adata)  # logarithmize the data
53

54
# Find highly variable genes
55
sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)
56
sc.pl.highly_variable_genes(adata)
57

58
# Principal component analysis
59
sc.pp.pca(adata, svd_solver='arpack')
60
sc.pl.pca_variance_ratio(adata, log=True, n_top_genes=50)
61

62
# Compute neighborhood graph
63
sc.pp.neighbors(adata, n_neighbors=10, n_pcs=40)
64

65
# UMAP embedding
66
sc.tl.umap(adata)
67
sc.pl.umap(adata)
68

69
# Leiden clustering
70
sc.tl.leiden(adata, resolution=0.5)
71
sc.pl.umap(adata, color=['leiden'])
72
```
73

74
## Architecture
75

76
Scanpy is built around the AnnData (Annotated Data) format, which efficiently stores large-scale single-cell data:
77

78
- **AnnData Object**: Central data structure containing expression matrix, cell/gene metadata, and analysis results
79
- **Modular Design**: Separate modules for preprocessing (`pp`), analysis tools (`tl`), and plotting (`pl`)
80
- **Integration**: Seamless integration with the scientific Python ecosystem (NumPy, pandas, matplotlib, seaborn)
81
- **Scalability**: Memory-efficient algorithms designed for datasets with millions of cells
82
- **Extensibility**: Plugin architecture supporting external tools and methods
83

84
## Capabilities
85

86
### Data Input/Output
87

88
Read and write various single-cell data formats including 10x Genomics, H5AD, Loom, CSV, and more. Support for both local files and remote data access.
89

90
```python { .api }
91
def read(filename, **kwargs):
92
    """Read file and return AnnData object."""
93
    
94
def read_10x_h5(filename, **kwargs):
95
    """Read 10x Genomics HDF5 file."""
96
    
97
def read_10x_mtx(path, **kwargs):
98
    """Read 10x Genomics MTX format."""
99
    
100
def read_visium(path, **kwargs):
101
    """Read 10x Visium spatial transcriptomics data."""
102
    
103
def write(filename, adata, **kwargs):
104
    """Write AnnData object to file."""
105
```
106

107
[Data I/O](./data-io.md)
108

109
### Preprocessing
110

111
Comprehensive preprocessing pipeline including quality control, filtering, normalization, scaling, feature selection, and dimensionality reduction. Essential steps for preparing raw single-cell data for downstream analysis.
112

113
```python { .api }
114
def filter_cells(adata, **kwargs):
115
    """Filter cells based on quality metrics."""
116
    
117
def filter_genes(adata, **kwargs):
118
    """Filter genes based on expression criteria."""
119
    
120
def normalize_total(adata, **kwargs):
121
    """Normalize counts per cell."""
122
    
123
def log1p(adata, **kwargs):
124
    """Logarithmize the data matrix."""
125
    
126
def highly_variable_genes(adata, **kwargs):
127
    """Identify highly variable genes."""
128
    
129
def pca(adata, **kwargs):
130
    """Principal component analysis."""
131
    
132
def neighbors(adata, **kwargs):
133
    """Compute neighborhood graph."""
134
```
135

136
[Preprocessing](./preprocessing.md)
137

138
### Analysis Tools
139

140
Advanced analysis methods including dimensionality reduction, clustering, trajectory inference, differential expression testing, and specialized single-cell analysis algorithms.
141

142
```python { .api }
143
def umap(adata, **kwargs):
144
    """UMAP embedding."""
145
    
146
def tsne(adata, **kwargs):
147
    """t-SNE embedding."""
148
    
149
def leiden(adata, **kwargs):
150
    """Leiden clustering."""
151
    
152
def louvain(adata, **kwargs):
153
    """Louvain clustering."""
154
    
155
def rank_genes_groups(adata, **kwargs):
156
    """Rank genes for characterizing groups."""
157
    
158
def dpt(adata, **kwargs):
159
    """Diffusion pseudotime analysis."""
160
    
161
def paga(adata, **kwargs):
162
    """Partition-based graph abstraction."""
163
```
164

165
[Analysis Tools](./analysis-tools.md)
166

167
### Visualization
168

169
Extensive plotting capabilities for single-cell data visualization including scatter plots, heatmaps, violin plots, trajectory plots, and specialized single-cell visualizations.
170

171
```python { .api }
172
def umap(adata, **kwargs):
173
    """Plot UMAP embedding."""
174
    
175
def scatter(adata, **kwargs):
176
    """Scatter plot of observations."""
177
    
178
def violin(adata, **kwargs):
179
    """Violin plot of gene expression."""
180
    
181
def heatmap(adata, **kwargs):
182
    """Heatmap of gene expression."""
183
    
184
def rank_genes_groups(adata, **kwargs):
185
    """Plot ranking of genes."""
186
    
187
def paga(adata, **kwargs):
188
    """Plot PAGA graph."""
189
```
190

191
[Visualization](./visualization.md)
192

193
### Built-in Datasets
194

195
Collection of standard single-cell datasets for testing, benchmarking, and educational purposes, including processed and raw versions of popular datasets.
196

197
```python { .api }
198
def pbmc3k():
199
    """3k PBMCs from 10x Genomics."""
200
    
201
def pbmc68k_reduced():
202
    """68k PBMCs, reduced for computational efficiency."""
203
    
204
def paul15():
205
    """Hematopoietic stem and progenitor cell dataset."""
206
    
207
def moignard15():
208
    """Blood development dataset."""
209
```
210

211
[Datasets](./datasets.md)
212

213
### External Tool Integration
214

215
Integration with popular external single-cell analysis tools and methods through a unified interface, extending scanpy's capabilities with specialized algorithms.
216

217
```python { .api }
218
def phate(adata, **kwargs):
219
    """PHATE dimensionality reduction."""
220
    
221
def palantir(adata, **kwargs):
222
    """Palantir trajectory inference."""
223
    
224
def harmony_integrate(adata, **kwargs):
225
    """Harmony batch correction."""
226
    
227
def magic(adata, **kwargs):
228
    """MAGIC imputation."""
229
```
230

231
[External Tools](./external-tools.md)
232

233
### Spatial Transcriptomics
234

235
Specialized functions for analyzing spatial transcriptomics data, including spatial statistics, visualization, and neighborhood analysis for spatially resolved single-cell data.
236

237
```python { .api }
238
def read_visium(path, **kwargs):
239
    """Read 10x Visium data."""
240
    
241
def spatial(adata, **kwargs):
242
    """Plot spatial transcriptomics data."""
243
    
244
def morans_i(adata, **kwargs):
245
    """Moran's I spatial autocorrelation."""
246
    
247
def gearys_c(adata, **kwargs):
248
    """Geary's C spatial autocorrelation."""
249
```
250

251
[Spatial Analysis](./spatial-analysis.md)
252

253
### Utilities and Settings
254

255
Configuration, logging, data extraction utilities, and helper functions for working with AnnData objects and managing analysis workflows.
256

257
```python { .api }
258
# Settings and configuration
259
settings: ScanpyConfig
260

261
# Data extraction utilities  
262
def obs_df(adata, **kwargs):
263
    """Extract observation dataframe."""
264
    
265
def var_df(adata, **kwargs):
266
    """Extract variable dataframe."""
267
    
268
# Logging functions
269
def print_versions():
270
    """Print version information."""
271
```
272

273
[Utilities](./utilities.md)
274

275
### Database Queries and Annotations
276

277
Biomart queries and gene annotation tools for enriching single-cell analysis with external database information.
278

279
```python { .api }
280
def biomart_annotations(org, attrs):
281
    """Query biomart for gene annotations."""
282
    
283
def enrich(gene_list, organism='hsapiens'):
284
    """Gene enrichment analysis using g:Profiler."""
285
    
286
def gene_coordinates(gene_list, org='hsapiens'):
287
    """Get genomic coordinates for genes."""
288
    
289
def mitochondrial_genes(org='hsapiens'):
290
    """Get mitochondrial gene list."""
291
```
292

293
[Database Queries](./queries.md)
294

295
## Core Types
296

297
```python { .api }
298
# Core data types (from anndata)
299
class AnnData:
300
    """Annotated data matrix."""
301
    def __init__(self, X, obs=None, var=None, **kwargs): ...
302
    
303
# Scanpy-specific types
304
class Neighbors:
305
    """Neighbors computation and storage."""
306
    def __init__(self, adata, **kwargs): ...
307
    
308
class Verbosity:
309
    """Logging verbosity levels."""
310
    
311
# Settings configuration
312
class ScanpyConfig:
313
    """Global scanpy settings."""
314
    verbosity: int
315
    n_jobs: int
316
    
317
    def set_figure_params(self, **kwargs): ...
318
```

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/