or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

analysis-tools.mddata-io.mddatasets.mdexternal-tools.mdindex.mdpreprocessing.mdqueries.mdspatial-analysis.mdutilities.mdvisualization.md

index.mddocs/

0

# Scanpy

1

2

Scanpy is a comprehensive toolkit for analyzing single-cell gene expression data that provides a scalable Python-based implementation for datasets exceeding one million cells. Built jointly with anndata, it offers a complete workflow including preprocessing, visualization, clustering, trajectory inference, and differential expression testing specifically designed for single-cell genomics research. The library integrates seamlessly with the scientific Python ecosystem and includes advanced algorithms for dimensionality reduction, neighborhood graphs, clustering methods, and pseudotime analysis, making it an essential tool for computational biology researchers working with single-cell RNA sequencing data and other single-cell omics technologies.

3

4

## Package Information

5

6

- **Package Name**: scanpy

7

- **Language**: Python

8

- **Installation**: `pip install scanpy`

9

10

## Core Imports

11

12

```python

13

import scanpy as sc

14

```

15

16

Common additional imports for working with scanpy:

17

18

```python

19

import scanpy as sc

20

import anndata as ad

21

import pandas as pd

22

import numpy as np

23

```

24

25

## Basic Usage

26

27

```python

28

import scanpy as sc

29

import pandas as pd

30

31

# Settings

32

sc.settings.verbosity = 3 # verbosity level

33

sc.settings.set_figure_params(dpi=80, facecolor='white')

34

35

# Load data (10x Genomics format)

36

adata = sc.read_10x_mtx(

37

'data/filtered_gene_bc_matrices/hg19/', # the directory with the .mtx file

38

var_names='gene_symbols', # use gene symbols for gene names (variables names)

39

cache=True # write a cache file for faster subsequent reading

40

)

41

42

# Basic preprocessing

43

sc.pp.filter_cells(adata, min_genes=200) # filter out cells expressing < 200 genes

44

sc.pp.filter_genes(adata, min_cells=3) # filter out genes expressed in < 3 cells

45

46

# Calculate QC metrics

47

adata.var['mt'] = adata.var_names.str.startswith('MT-') # mitochondrial genes

48

sc.pp.calculate_qc_metrics(adata, percent_top=None, log1p=False, inplace=True)

49

50

# Normalization and scaling

51

sc.pp.normalize_total(adata, target_sum=1e4) # normalize every cell to 10,000 UMI

52

sc.pp.log1p(adata) # logarithmize the data

53

54

# Find highly variable genes

55

sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)

56

sc.pl.highly_variable_genes(adata)

57

58

# Principal component analysis

59

sc.pp.pca(adata, svd_solver='arpack')

60

sc.pl.pca_variance_ratio(adata, log=True, n_top_genes=50)

61

62

# Compute neighborhood graph

63

sc.pp.neighbors(adata, n_neighbors=10, n_pcs=40)

64

65

# UMAP embedding

66

sc.tl.umap(adata)

67

sc.pl.umap(adata)

68

69

# Leiden clustering

70

sc.tl.leiden(adata, resolution=0.5)

71

sc.pl.umap(adata, color=['leiden'])

72

```

73

74

## Architecture

75

76

Scanpy is built around the AnnData (Annotated Data) format, which efficiently stores large-scale single-cell data:

77

78

- **AnnData Object**: Central data structure containing expression matrix, cell/gene metadata, and analysis results

79

- **Modular Design**: Separate modules for preprocessing (`pp`), analysis tools (`tl`), and plotting (`pl`)

80

- **Integration**: Seamless integration with the scientific Python ecosystem (NumPy, pandas, matplotlib, seaborn)

81

- **Scalability**: Memory-efficient algorithms designed for datasets with millions of cells

82

- **Extensibility**: Plugin architecture supporting external tools and methods

83

84

## Capabilities

85

86

### Data Input/Output

87

88

Read and write various single-cell data formats including 10x Genomics, H5AD, Loom, CSV, and more. Support for both local files and remote data access.

89

90

```python { .api }

91

def read(filename, **kwargs):

92

"""Read file and return AnnData object."""

93

94

def read_10x_h5(filename, **kwargs):

95

"""Read 10x Genomics HDF5 file."""

96

97

def read_10x_mtx(path, **kwargs):

98

"""Read 10x Genomics MTX format."""

99

100

def read_visium(path, **kwargs):

101

"""Read 10x Visium spatial transcriptomics data."""

102

103

def write(filename, adata, **kwargs):

104

"""Write AnnData object to file."""

105

```

106

107

[Data I/O](./data-io.md)

108

109

### Preprocessing

110

111

Comprehensive preprocessing pipeline including quality control, filtering, normalization, scaling, feature selection, and dimensionality reduction. Essential steps for preparing raw single-cell data for downstream analysis.

112

113

```python { .api }

114

def filter_cells(adata, **kwargs):

115

"""Filter cells based on quality metrics."""

116

117

def filter_genes(adata, **kwargs):

118

"""Filter genes based on expression criteria."""

119

120

def normalize_total(adata, **kwargs):

121

"""Normalize counts per cell."""

122

123

def log1p(adata, **kwargs):

124

"""Logarithmize the data matrix."""

125

126

def highly_variable_genes(adata, **kwargs):

127

"""Identify highly variable genes."""

128

129

def pca(adata, **kwargs):

130

"""Principal component analysis."""

131

132

def neighbors(adata, **kwargs):

133

"""Compute neighborhood graph."""

134

```

135

136

[Preprocessing](./preprocessing.md)

137

138

### Analysis Tools

139

140

Advanced analysis methods including dimensionality reduction, clustering, trajectory inference, differential expression testing, and specialized single-cell analysis algorithms.

141

142

```python { .api }

143

def umap(adata, **kwargs):

144

"""UMAP embedding."""

145

146

def tsne(adata, **kwargs):

147

"""t-SNE embedding."""

148

149

def leiden(adata, **kwargs):

150

"""Leiden clustering."""

151

152

def louvain(adata, **kwargs):

153

"""Louvain clustering."""

154

155

def rank_genes_groups(adata, **kwargs):

156

"""Rank genes for characterizing groups."""

157

158

def dpt(adata, **kwargs):

159

"""Diffusion pseudotime analysis."""

160

161

def paga(adata, **kwargs):

162

"""Partition-based graph abstraction."""

163

```

164

165

[Analysis Tools](./analysis-tools.md)

166

167

### Visualization

168

169

Extensive plotting capabilities for single-cell data visualization including scatter plots, heatmaps, violin plots, trajectory plots, and specialized single-cell visualizations.

170

171

```python { .api }

172

def umap(adata, **kwargs):

173

"""Plot UMAP embedding."""

174

175

def scatter(adata, **kwargs):

176

"""Scatter plot of observations."""

177

178

def violin(adata, **kwargs):

179

"""Violin plot of gene expression."""

180

181

def heatmap(adata, **kwargs):

182

"""Heatmap of gene expression."""

183

184

def rank_genes_groups(adata, **kwargs):

185

"""Plot ranking of genes."""

186

187

def paga(adata, **kwargs):

188

"""Plot PAGA graph."""

189

```

190

191

[Visualization](./visualization.md)

192

193

### Built-in Datasets

194

195

Collection of standard single-cell datasets for testing, benchmarking, and educational purposes, including processed and raw versions of popular datasets.

196

197

```python { .api }

198

def pbmc3k():

199

"""3k PBMCs from 10x Genomics."""

200

201

def pbmc68k_reduced():

202

"""68k PBMCs, reduced for computational efficiency."""

203

204

def paul15():

205

"""Hematopoietic stem and progenitor cell dataset."""

206

207

def moignard15():

208

"""Blood development dataset."""

209

```

210

211

[Datasets](./datasets.md)

212

213

### External Tool Integration

214

215

Integration with popular external single-cell analysis tools and methods through a unified interface, extending scanpy's capabilities with specialized algorithms.

216

217

```python { .api }

218

def phate(adata, **kwargs):

219

"""PHATE dimensionality reduction."""

220

221

def palantir(adata, **kwargs):

222

"""Palantir trajectory inference."""

223

224

def harmony_integrate(adata, **kwargs):

225

"""Harmony batch correction."""

226

227

def magic(adata, **kwargs):

228

"""MAGIC imputation."""

229

```

230

231

[External Tools](./external-tools.md)

232

233

### Spatial Transcriptomics

234

235

Specialized functions for analyzing spatial transcriptomics data, including spatial statistics, visualization, and neighborhood analysis for spatially resolved single-cell data.

236

237

```python { .api }

238

def read_visium(path, **kwargs):

239

"""Read 10x Visium data."""

240

241

def spatial(adata, **kwargs):

242

"""Plot spatial transcriptomics data."""

243

244

def morans_i(adata, **kwargs):

245

"""Moran's I spatial autocorrelation."""

246

247

def gearys_c(adata, **kwargs):

248

"""Geary's C spatial autocorrelation."""

249

```

250

251

[Spatial Analysis](./spatial-analysis.md)

252

253

### Utilities and Settings

254

255

Configuration, logging, data extraction utilities, and helper functions for working with AnnData objects and managing analysis workflows.

256

257

```python { .api }

258

# Settings and configuration

259

settings: ScanpyConfig

260

261

# Data extraction utilities

262

def obs_df(adata, **kwargs):

263

"""Extract observation dataframe."""

264

265

def var_df(adata, **kwargs):

266

"""Extract variable dataframe."""

267

268

# Logging functions

269

def print_versions():

270

"""Print version information."""

271

```

272

273

[Utilities](./utilities.md)

274

275

### Database Queries and Annotations

276

277

Biomart queries and gene annotation tools for enriching single-cell analysis with external database information.

278

279

```python { .api }

280

def biomart_annotations(org, attrs):

281

"""Query biomart for gene annotations."""

282

283

def enrich(gene_list, organism='hsapiens'):

284

"""Gene enrichment analysis using g:Profiler."""

285

286

def gene_coordinates(gene_list, org='hsapiens'):

287

"""Get genomic coordinates for genes."""

288

289

def mitochondrial_genes(org='hsapiens'):

290

"""Get mitochondrial gene list."""

291

```

292

293

[Database Queries](./queries.md)

294

295

## Core Types

296

297

```python { .api }

298

# Core data types (from anndata)

299

class AnnData:

300

"""Annotated data matrix."""

301

def __init__(self, X, obs=None, var=None, **kwargs): ...

302

303

# Scanpy-specific types

304

class Neighbors:

305

"""Neighbors computation and storage."""

306

def __init__(self, adata, **kwargs): ...

307

308

class Verbosity:

309

"""Logging verbosity levels."""

310

311

# Settings configuration

312

class ScanpyConfig:

313

"""Global scanpy settings."""

314

verbosity: int

315

n_jobs: int

316

317

def set_figure_params(self, **kwargs): ...

318

```