CtrlK
BlogDocsLog inGet started
Tessl Logo

single-cell-rnaseq-pipeline

Generate single-cell RNA-seq analysis code templates for Seurat and Scanpy, supporting QC, clustering, visualization, and downstream analysis. Trigger when users need scRNA-seq analysis pipelines, preprocessing workflows, or batch correction code.

88

1.28x
Quality

86%

Does it follow best practices?

Impact

90%

1.28x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Single-Cell RNA-seq Pipeline

Overview

Generate comprehensive single-cell RNA-seq analysis code templates for Seurat (R) and Scanpy (Python). This skill provides ready-to-use code frameworks for preprocessing, quality control, normalization, clustering, marker identification, visualization, and advanced analyses like batch correction and trajectory inference.

Technical Difficulty: High

When to Use

  • Building scRNA-seq analysis pipelines from raw count matrices
  • Need standardized QC and preprocessing workflows
  • Performing batch correction across multiple samples/datasets
  • Running dimensionality reduction and clustering
  • Identifying cell type-specific marker genes
  • Creating publication-ready visualizations (UMAP, violin plots, heatmaps)
  • Conducting trajectory inference (pseudotime analysis)
  • Comparing cell populations between conditions

Core Features

Seurat (R) Templates

  1. Data Loading: 10x Genomics, H5AD, Cell Ranger outputs
  2. QC Metrics: Mitochondrial content, gene counts, doublet detection
  3. Normalization: Log-normalization, SCTransform
  4. Integration: Harmony, RPCA, CCA for batch correction
  5. Clustering: Graph-based clustering with optimization
  6. Visualization: UMAP, t-SNE, feature plots, dot plots
  7. Marker Analysis: Wilcoxon tests, conserved markers
  8. Differential Expression: FindAllMarkers, FindConservedMarkers
  9. Cell Typing: Reference-based annotation with SingleR/Azimuth

Scanpy (Python) Templates

  1. Data Loading: AnnData, 10x, CSV, loom files
  2. QC Workflow: Comprehensive filtering and metrics
  3. Normalization: Log1p, scran, Combat batch correction
  4. Integration: scVI, Scanorama, BBKNN
  5. Clustering: Leiden/Louvain with resolution sweep
  6. Visualization: UMAP, PAGA, embeddings
  7. Marker Analysis: rank_genes_groups, filter markers
  8. Trajectory: PAGA, diffusion pseudotime (DPT)
  9. CellChat/CellPhoneDB: Cell-cell communication

Usage

Generate Seurat Template

python scripts/main.py --tool seurat --output seurat_analysis.R --species human

Generate Scanpy Template

python scripts/main.py --tool scanpy --output scanpy_analysis.py --species mouse

Generate Both Templates

python scripts/main.py --tool both --output scrna_pipeline --species human --batch-correction harmony --trajectory true

Command-Line Parameters

ParameterTypeRequiredDescription
--toolstringYesAnalysis tool: seurat, scanpy, or both
--outputstringYesOutput file or directory path
--speciesstringNoSpecies: human or mouse (default: human)
--batch-correctionstringNoMethod: harmony, rpca, cca, scanorama, scvi
--trajectoryboolNoInclude trajectory analysis (default: false)
--cell-communicationboolNoInclude cell-cell communication (default: false)
--de-analysisboolNoInclude differential expression (default: false)
--spatialboolNoInclude spatial transcriptomics (default: false)

Output Structure

output/
├── seurat/
│   ├── 01_load_and_qc.R
│   ├── 02_normalize_integrate.R
│   ├── 03_cluster_annotate.R
│   ├── 04_visualize.R
│   └── 05_de_analysis.R (if --de-analysis)
├── scanpy/
│   ├── 01_load_qc.py
│   ├── 02_normalize_integrate.py
│   ├── 03_cluster_annotate.py
│   ├── 04_visualize.py
│   └── 05_trajectory.py (if --trajectory)
└── README.md

Technical Details

Supported Input Formats

  • 10x Genomics Cell Ranger outputs (barcodes.tsv, features.tsv, matrix.mtx)
  • H5AD (AnnData h5 format)
  • Seurat RDS objects
  • CSV/TSV count matrices
  • HDF5 files

QC Parameters (Default)

MetricHumanMouse
min_genes200200
max_genes2500025000
min_cells33
max_mt_percent20%20%
doublet_thresholdAutoAuto

Clustering Resolution Guidelines

  • 0.4-0.6: Broad cell types
  • 0.8-1.2: Subtypes
  • 1.5-2.0: Fine populations

Batch Correction Recommendations

ScenarioSeuratScanpy
Small batches (<5)HarmonyHarmony
Large batchesRPCAScanorama
Complex variationCCAscVI

Code Examples

Seurat Quick Start

# Load data
seurat_obj <- CreateSeuratObject(counts = raw_data, project = "Sample")

# QC
seurat_obj[["percent.mt"]] <- PercentageFeatureSet(seurat_obj, pattern = "^MT-")
seurat_obj <- subset(seurat_obj, subset = nFeature_RNA > 200 & percent.mt < 20)

# Normalize
seurat_obj <- NormalizeData(seurat_obj)
seurat_obj <- FindVariableFeatures(seurat_obj, selection.method = "vst", nfeatures = 2000)

# Scale and PCA
seurat_obj <- ScaleData(seurat_obj)
seurat_obj <- RunPCA(seurat_obj, features = VariableFeatures(object = seurat_obj))

# Cluster
seurat_obj <- FindNeighbors(seurat_obj, dims = 1:30)
seurat_obj <- FindClusters(seurat_obj, resolution = 1.0)
seurat_obj <- RunUMAP(seurat_obj, dims = 1:30)

# Visualize
DimPlot(seurat_obj, reduction = "umap", label = TRUE)
FeaturePlot(seurat_obj, features = c("CD3E", "CD14", "CD79A"))

Scanpy Quick Start

import scanpy as sc

# Load data
adata = sc.read_10x_mtx("filtered_gene_bc_matrices/")

# QC
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
adata.var['mt'] = adata.var_names.str.startswith('MT-')
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], percent_top=None, inplace=True)
adata = adata[adata.obs.pct_counts_mt < 20, :]

# Normalize
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, n_top_genes=2000)

# PCA and UMAP
sc.pp.scale(adata)
sc.tl.pca(adata, svd_solver='arpack')
sc.pp.neighbors(adata, n_neighbors=15, n_pcs=30)
sc.tl.umap(adata)
sc.tl.leiden(adata, resolution=1.0)

# Visualize
sc.pl.umap(adata, color=['leiden', 'total_counts'])
sc.pl.dotplot(adata, var_names=['CD3E', 'CD14', 'CD79A'], groupby='leiden')

References

  • references/seurat_template.R - Complete Seurat analysis template
  • references/scanpy_template.py - Complete Scanpy analysis template
  • references/batch_correction_guide.md - Batch correction comparison
  • requirements.txt - Python dependencies

Dependencies

Seurat (R)

install.packages(c("Seurat", "SeuratObject", "tidyverse", "patchwork"))
# Optional
remotes::install_github("satijalab/seurat-wrappers")
remotes::install_github("immunogenomics/harmony")
BiocManager::install("SingleR")

Scanpy (Python)

pip install scanpy leidenalg scvi-tools cellchatpy

Testing

Run basic validation:

cd scripts
python test_main.py

Error Handling

All errors return semantic messages:

{
  "status": "error",
  "error": {
    "type": "invalid_parameter",
    "message": "Unsupported batch correction method: 'xyz'",
    "suggestion": "Use one of: harmony, rpca, cca, scanorama, scvi"
  }
}

Safety & Compliance

  • No external API calls
  • All code templates are self-contained
  • No hardcoded credentials or paths
  • Templates use relative paths for data
  • Default parameters are conservative for safety

Citation

If using generated templates in publications:

  • Seurat: Satija Lab, Nature Biotechnology 2015
  • Scanpy: Wolf et al., Genome Biology 2018
  • scVI: Lopez et al., Nature Methods 2018
  • Harmony: Korsunsky et al., Nature Methods 2019

Risk Assessment

Risk IndicatorAssessmentLevel
Code ExecutionPython/R scripts executed locallyMedium
Network AccessNo external API callsLow
File System AccessRead input files, write output filesMedium
Instruction TamperingStandard prompt guidelinesLow
Data ExposureOutput files saved to workspaceLow

Security Checklist

  • No hardcoded credentials or API keys
  • No unauthorized file system access (../)
  • Output does not expose sensitive information
  • Prompt injection protections in place
  • Input file paths validated (no ../ traversal)
  • Output directory restricted to workspace
  • Script execution in sandboxed environment
  • Error messages sanitized (no stack traces exposed)
  • Dependencies audited

Prerequisites

# Python dependencies
pip install -r requirements.txt

Evaluation Criteria

Success Metrics

  • Successfully executes main functionality
  • Output meets quality standards
  • Handles edge cases gracefully
  • Performance is acceptable

Test Cases

  1. Basic Functionality: Standard input → Expected output
  2. Edge Case: Invalid input → Graceful error handling
  3. Performance: Large dataset → Acceptable processing time

Lifecycle Status

  • Current Stage: Draft
  • Next Review Date: 2026-03-06
  • Known Issues: None
  • Planned Improvements:
    • Performance optimization
    • Additional feature support
Repository
aipoch/medical-research-skills
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.