or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

analysis-tools.mddata-io.mddatasets.mdexternal-tools.mdindex.mdpreprocessing.mdqueries.mdspatial-analysis.mdutilities.mdvisualization.md
tile.json

tessl/pypi-scanpy

Comprehensive toolkit for analyzing single-cell gene expression data with scalable Python implementation supporting preprocessing, visualization, clustering, trajectory inference, and differential expression testing.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/scanpy@1.11.x

To install, run

npx @tessl/cli install tessl/pypi-scanpy@1.11.0

index.mddocs/

Scanpy

Scanpy is a comprehensive toolkit for analyzing single-cell gene expression data that provides a scalable Python-based implementation for datasets exceeding one million cells. Built jointly with anndata, it offers a complete workflow including preprocessing, visualization, clustering, trajectory inference, and differential expression testing specifically designed for single-cell genomics research. The library integrates seamlessly with the scientific Python ecosystem and includes advanced algorithms for dimensionality reduction, neighborhood graphs, clustering methods, and pseudotime analysis, making it an essential tool for computational biology researchers working with single-cell RNA sequencing data and other single-cell omics technologies.

Package Information

  • Package Name: scanpy
  • Language: Python
  • Installation: pip install scanpy

Core Imports

import scanpy as sc

Common additional imports for working with scanpy:

import scanpy as sc
import anndata as ad
import pandas as pd
import numpy as np

Basic Usage

import scanpy as sc
import pandas as pd

# Settings
sc.settings.verbosity = 3  # verbosity level
sc.settings.set_figure_params(dpi=80, facecolor='white')

# Load data (10x Genomics format)
adata = sc.read_10x_mtx(
    'data/filtered_gene_bc_matrices/hg19/',  # the directory with the .mtx file
    var_names='gene_symbols',  # use gene symbols for gene names (variables names)
    cache=True  # write a cache file for faster subsequent reading
)

# Basic preprocessing
sc.pp.filter_cells(adata, min_genes=200)  # filter out cells expressing < 200 genes
sc.pp.filter_genes(adata, min_cells=3)   # filter out genes expressed in < 3 cells

# Calculate QC metrics
adata.var['mt'] = adata.var_names.str.startswith('MT-')  # mitochondrial genes
sc.pp.calculate_qc_metrics(adata, percent_top=None, log1p=False, inplace=True)

# Normalization and scaling
sc.pp.normalize_total(adata, target_sum=1e4)  # normalize every cell to 10,000 UMI
sc.pp.log1p(adata)  # logarithmize the data

# Find highly variable genes
sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)
sc.pl.highly_variable_genes(adata)

# Principal component analysis
sc.pp.pca(adata, svd_solver='arpack')
sc.pl.pca_variance_ratio(adata, log=True, n_top_genes=50)

# Compute neighborhood graph
sc.pp.neighbors(adata, n_neighbors=10, n_pcs=40)

# UMAP embedding
sc.tl.umap(adata)
sc.pl.umap(adata)

# Leiden clustering
sc.tl.leiden(adata, resolution=0.5)
sc.pl.umap(adata, color=['leiden'])

Architecture

Scanpy is built around the AnnData (Annotated Data) format, which efficiently stores large-scale single-cell data:

  • AnnData Object: Central data structure containing expression matrix, cell/gene metadata, and analysis results
  • Modular Design: Separate modules for preprocessing (pp), analysis tools (tl), and plotting (pl)
  • Integration: Seamless integration with the scientific Python ecosystem (NumPy, pandas, matplotlib, seaborn)
  • Scalability: Memory-efficient algorithms designed for datasets with millions of cells
  • Extensibility: Plugin architecture supporting external tools and methods

Capabilities

Data Input/Output

Read and write various single-cell data formats including 10x Genomics, H5AD, Loom, CSV, and more. Support for both local files and remote data access.

def read(filename, **kwargs):
    """Read file and return AnnData object."""
    
def read_10x_h5(filename, **kwargs):
    """Read 10x Genomics HDF5 file."""
    
def read_10x_mtx(path, **kwargs):
    """Read 10x Genomics MTX format."""
    
def read_visium(path, **kwargs):
    """Read 10x Visium spatial transcriptomics data."""
    
def write(filename, adata, **kwargs):
    """Write AnnData object to file."""

Data I/O

Preprocessing

Comprehensive preprocessing pipeline including quality control, filtering, normalization, scaling, feature selection, and dimensionality reduction. Essential steps for preparing raw single-cell data for downstream analysis.

def filter_cells(adata, **kwargs):
    """Filter cells based on quality metrics."""
    
def filter_genes(adata, **kwargs):
    """Filter genes based on expression criteria."""
    
def normalize_total(adata, **kwargs):
    """Normalize counts per cell."""
    
def log1p(adata, **kwargs):
    """Logarithmize the data matrix."""
    
def highly_variable_genes(adata, **kwargs):
    """Identify highly variable genes."""
    
def pca(adata, **kwargs):
    """Principal component analysis."""
    
def neighbors(adata, **kwargs):
    """Compute neighborhood graph."""

Preprocessing

Analysis Tools

Advanced analysis methods including dimensionality reduction, clustering, trajectory inference, differential expression testing, and specialized single-cell analysis algorithms.

def umap(adata, **kwargs):
    """UMAP embedding."""
    
def tsne(adata, **kwargs):
    """t-SNE embedding."""
    
def leiden(adata, **kwargs):
    """Leiden clustering."""
    
def louvain(adata, **kwargs):
    """Louvain clustering."""
    
def rank_genes_groups(adata, **kwargs):
    """Rank genes for characterizing groups."""
    
def dpt(adata, **kwargs):
    """Diffusion pseudotime analysis."""
    
def paga(adata, **kwargs):
    """Partition-based graph abstraction."""

Analysis Tools

Visualization

Extensive plotting capabilities for single-cell data visualization including scatter plots, heatmaps, violin plots, trajectory plots, and specialized single-cell visualizations.

def umap(adata, **kwargs):
    """Plot UMAP embedding."""
    
def scatter(adata, **kwargs):
    """Scatter plot of observations."""
    
def violin(adata, **kwargs):
    """Violin plot of gene expression."""
    
def heatmap(adata, **kwargs):
    """Heatmap of gene expression."""
    
def rank_genes_groups(adata, **kwargs):
    """Plot ranking of genes."""
    
def paga(adata, **kwargs):
    """Plot PAGA graph."""

Visualization

Built-in Datasets

Collection of standard single-cell datasets for testing, benchmarking, and educational purposes, including processed and raw versions of popular datasets.

def pbmc3k():
    """3k PBMCs from 10x Genomics."""
    
def pbmc68k_reduced():
    """68k PBMCs, reduced for computational efficiency."""
    
def paul15():
    """Hematopoietic stem and progenitor cell dataset."""
    
def moignard15():
    """Blood development dataset."""

Datasets

External Tool Integration

Integration with popular external single-cell analysis tools and methods through a unified interface, extending scanpy's capabilities with specialized algorithms.

def phate(adata, **kwargs):
    """PHATE dimensionality reduction."""
    
def palantir(adata, **kwargs):
    """Palantir trajectory inference."""
    
def harmony_integrate(adata, **kwargs):
    """Harmony batch correction."""
    
def magic(adata, **kwargs):
    """MAGIC imputation."""

External Tools

Spatial Transcriptomics

Specialized functions for analyzing spatial transcriptomics data, including spatial statistics, visualization, and neighborhood analysis for spatially resolved single-cell data.

def read_visium(path, **kwargs):
    """Read 10x Visium data."""
    
def spatial(adata, **kwargs):
    """Plot spatial transcriptomics data."""
    
def morans_i(adata, **kwargs):
    """Moran's I spatial autocorrelation."""
    
def gearys_c(adata, **kwargs):
    """Geary's C spatial autocorrelation."""

Spatial Analysis

Utilities and Settings

Configuration, logging, data extraction utilities, and helper functions for working with AnnData objects and managing analysis workflows.

# Settings and configuration
settings: ScanpyConfig

# Data extraction utilities  
def obs_df(adata, **kwargs):
    """Extract observation dataframe."""
    
def var_df(adata, **kwargs):
    """Extract variable dataframe."""
    
# Logging functions
def print_versions():
    """Print version information."""

Utilities

Database Queries and Annotations

Biomart queries and gene annotation tools for enriching single-cell analysis with external database information.

def biomart_annotations(org, attrs):
    """Query biomart for gene annotations."""
    
def enrich(gene_list, organism='hsapiens'):
    """Gene enrichment analysis using g:Profiler."""
    
def gene_coordinates(gene_list, org='hsapiens'):
    """Get genomic coordinates for genes."""
    
def mitochondrial_genes(org='hsapiens'):
    """Get mitochondrial gene list."""

Database Queries

Core Types

# Core data types (from anndata)
class AnnData:
    """Annotated data matrix."""
    def __init__(self, X, obs=None, var=None, **kwargs): ...
    
# Scanpy-specific types
class Neighbors:
    """Neighbors computation and storage."""
    def __init__(self, adata, **kwargs): ...
    
class Verbosity:
    """Logging verbosity levels."""
    
# Settings configuration
class ScanpyConfig:
    """Global scanpy settings."""
    verbosity: int
    n_jobs: int
    
    def set_figure_params(self, **kwargs): ...