0
# Spatial Analysis
1
2
Scanpy provides specialized functions for analyzing spatial transcriptomics data, including spatial statistics, visualization, and neighborhood analysis for spatially resolved single-cell data. These tools are designed to work with data from platforms like 10x Visium, Slide-seq, and other spatial transcriptomics technologies.
3
4
## Capabilities
5
6
### Spatial Data Loading
7
8
Load spatial transcriptomics data with coordinate information.
9
10
```python { .api }
11
def read_visium(path, genome=None, count_file='filtered_feature_bc_matrix.h5', library_id=None, load_images=True, source_image_path=None):
12
"""
13
Read 10x Visium spatial transcriptomics data.
14
15
Parameters:
16
- path (str): Path to Visium output directory
17
- genome (str, optional): Genome to read from h5 file
18
- count_file (str): Count matrix filename
19
- library_id (str, optional): Library identifier for multiple samples
20
- load_images (bool): Load tissue images
21
- source_image_path (str, optional): Custom path to images
22
23
Returns:
24
AnnData: Spatial transcriptomics data with coordinates and images
25
"""
26
```
27
28
### Spatial Statistics
29
30
Calculate spatial autocorrelation and neighborhood statistics.
31
32
```python
33
import scanpy as sc
34
```
35
36
```python { .api }
37
def morans_i(adata_or_graph, vals=None, *, use_graph=None, layer=None, obsm=None, obsp=None, use_raw=False):
38
"""
39
Calculate Moran's I Global Autocorrelation Statistic.
40
41
Moran's I measures spatial autocorrelation on a graph. Commonly used in
42
spatial data analysis to assess autocorrelation on a 2D grid.
43
44
Parameters:
45
- adata_or_graph (AnnData or sparse matrix): AnnData object or graph matrix
46
- vals (array, optional): Values to calculate Moran's I for
47
- use_graph (str, optional): Key for graph in adata (default: neighbors connectivities)
48
- layer (str, optional): Key for adata.layers to choose vals
49
- obsm (str, optional): Key for adata.obsm to choose vals
50
- obsp (str, optional): Key for adata.obsp to choose vals
51
- use_raw (bool): Whether to use adata.raw.X for vals
52
53
Returns:
54
array or float: Moran's I statistic(s)
55
"""
56
57
def gearys_c(adata_or_graph, vals=None, *, use_graph=None, layer=None, obsm=None, obsp=None, use_raw=False):
58
"""
59
Calculate Geary's C Global Autocorrelation Statistic.
60
61
Geary's C measures spatial autocorrelation - values close to 0 indicate
62
positive spatial autocorrelation, values close to 2 indicate negative
63
spatial autocorrelation.
64
65
Parameters:
66
- adata_or_graph (AnnData or sparse matrix): AnnData object or graph matrix
67
- vals (array, optional): Values to calculate Geary's C for
68
- use_graph (str, optional): Key for graph in adata (default: neighbors connectivities)
69
- layer (str, optional): Key for adata.layers to choose vals
70
- obsm (str, optional): Key for adata.obsm to choose vals
71
- obsp (str, optional): Key for adata.obsp to choose vals
72
- use_raw (bool): Whether to use adata.raw.X for vals
73
74
Returns:
75
array or float: Geary's C statistic(s)
76
"""
77
78
def confusion_matrix(orig, new, data=None, *, normalize=True):
79
"""
80
Create a labeled confusion matrix from original and new labels.
81
82
Parameters:
83
- orig (array or str): Original labels or column name in data
84
- new (array or str): New labels or column name in data
85
- data (DataFrame, optional): DataFrame containing label columns
86
- normalize (bool): Whether to normalize the confusion matrix
87
88
Returns:
89
DataFrame: Labeled confusion matrix
90
"""
91
```
92
93
### Spatial Visualization
94
95
Visualize spatial transcriptomics data with coordinate information.
96
97
```python { .api }
98
def spatial(adata, basis='spatial', color=None, use_raw=None, sort_order=True, alpha=None, groups=None, components=None, dimensions=None, layer=None, bw=None, contour=False, title=None, save=None, ax=None, return_fig=None, img_key=None, crop_coord=None, alpha_img=1.0, bw_method='scott', bw_adjust=1, **kwargs):
99
"""
100
Plot spatial transcriptomics data.
101
102
Parameters:
103
- adata (AnnData): Annotated data object with spatial coordinates
104
- basis (str): Key in obsm for spatial coordinates
105
- color (str or list, optional): Keys for coloring spots
106
- use_raw (bool, optional): Use raw attribute for gene expression
107
- sort_order (bool): Sort points by color values
108
- alpha (float, optional): Transparency of spots
109
- groups (str or list, optional): Restrict to specific groups
110
- components (str or list, optional): Spatial components to plot
111
- dimensions (tuple, optional): Dimensions to plot
112
- layer (str, optional): Data layer to use
113
- bw (str or float, optional): Bandwidth for density estimation
114
- contour (bool): Add contour lines for continuous values
115
- title (str, optional): Plot title
116
- save (str, optional): Save figure to file
117
- ax (Axes, optional): Matplotlib axes object
118
- return_fig (bool, optional): Return figure object
119
- img_key (str, optional): Key for tissue image in uns
120
- crop_coord (tuple, optional): Coordinates for cropping image
121
- alpha_img (float): Transparency of background image
122
- bw_method (str): Method for bandwidth estimation
123
- bw_adjust (float): Bandwidth adjustment factor
124
- **kwargs: Additional plotting parameters
125
126
Returns:
127
None or Figure: Plot or figure object (if return_fig=True)
128
"""
129
```
130
131
### Spatial Neighborhood Analysis
132
133
Analyze spatial neighborhoods and local patterns.
134
135
```python { .api }
136
def spatial_neighbors(adata, coord_type='generic', n_rings=1, n_neighs=6, radius=None, set_diag=False, key_added='spatial', copy=False):
137
"""
138
Compute spatial neighborhood graph.
139
140
Parameters:
141
- adata (AnnData): Annotated data object with spatial coordinates
142
- coord_type (str): Type of coordinates ('visium', 'generic')
143
- n_rings (int): Number of rings for Visium hexagonal grid
144
- n_neighs (int): Number of neighbors for generic coordinates
145
- radius (float, optional): Radius for neighborhood definition
146
- set_diag (bool): Set diagonal of adjacency matrix
147
- key_added (str): Key for storing spatial graph
148
- copy (bool): Return copy
149
150
Returns:
151
AnnData or None: Object with spatial graph (if copy=True)
152
"""
153
154
def spatial_autocorr(adata, mode='moran', genes=None, n_perms=None, n_jobs=1, copy=False, **kwargs):
155
"""
156
Calculate spatial autocorrelation for multiple genes.
157
158
Parameters:
159
- adata (AnnData): Annotated data object
160
- mode (str): Autocorrelation method ('moran', 'geary')
161
- genes (list, optional): Genes to analyze
162
- n_perms (int, optional): Permutations for significance
163
- n_jobs (int): Number of parallel jobs
164
- copy (bool): Return copy
165
- **kwargs: Additional parameters
166
167
Returns:
168
AnnData or None: Object with autocorrelation results (if copy=True)
169
"""
170
```
171
172
### Spatial Gene Expression Patterns
173
174
Identify spatially variable genes and expression patterns.
175
176
```python { .api }
177
def spatial_variable_genes(adata, layer=None, n_top_genes=None, min_counts=3, alpha=0.05, copy=False):
178
"""
179
Identify spatially variable genes.
180
181
Parameters:
182
- adata (AnnData): Annotated data object
183
- layer (str, optional): Data layer to use
184
- n_top_genes (int, optional): Number of top genes to return
185
- min_counts (int): Minimum counts per gene
186
- alpha (float): Significance threshold
187
- copy (bool): Return copy
188
189
Returns:
190
AnnData or None: Object with spatial variability results (if copy=True)
191
"""
192
193
def spatial_domains(adata, resolution=1.0, key_added='spatial_domains', copy=False, **kwargs):
194
"""
195
Identify spatial domains using clustering on spatial neighbors.
196
197
Parameters:
198
- adata (AnnData): Annotated data object with spatial graph
199
- resolution (float): Clustering resolution
200
- key_added (str): Key for storing domain labels
201
- copy (bool): Return copy
202
- **kwargs: Additional clustering parameters
203
204
Returns:
205
AnnData or None: Object with spatial domains (if copy=True)
206
"""
207
```
208
209
### Spatial Trajectory Analysis
210
211
Analyze trajectories and gradients in spatial context.
212
213
```python { .api }
214
def spatial_gradient(adata, genes, coord_type='generic', n_neighbors=10, copy=False):
215
"""
216
Calculate spatial gradients for gene expression.
217
218
Parameters:
219
- adata (AnnData): Annotated data object
220
- genes (list): Genes to calculate gradients for
221
- coord_type (str): Type of spatial coordinates
222
- n_neighbors (int): Number of neighbors for gradient calculation
223
- copy (bool): Return copy
224
225
Returns:
226
AnnData or None: Object with gradient information (if copy=True)
227
"""
228
229
def spatial_velocity(adata, velocity_graph='velocity_graph', spatial_graph='spatial_connectivities', copy=False):
230
"""
231
Project RNA velocity onto spatial coordinates.
232
233
Parameters:
234
- adata (AnnData): Annotated data object with velocity and spatial data
235
- velocity_graph (str): Key for velocity graph
236
- spatial_graph (str): Key for spatial connectivity
237
- copy (bool): Return copy
238
239
Returns:
240
AnnData or None: Object with spatial velocity (if copy=True)
241
"""
242
```
243
244
## Usage Examples
245
246
### Loading and Visualizing Visium Data
247
248
```python
249
import scanpy as sc
250
import matplotlib.pyplot as plt
251
252
# Load Visium data
253
adata = sc.read_visium('path/to/visium/output/')
254
255
# Basic preprocessing
256
sc.pp.filter_genes(adata, min_cells=10)
257
sc.pp.normalize_total(adata, target_sum=1e4)
258
sc.pp.log1p(adata)
259
260
# Visualize total counts and gene counts
261
sc.pl.spatial(adata, color=['total_counts', 'n_genes_by_counts'],
262
ncols=2, size=1.5)
263
264
# Visualize specific genes
265
genes_of_interest = ['GAPDH', 'ACTB', 'MYC']
266
sc.pl.spatial(adata, color=genes_of_interest, ncols=3, size=1.5)
267
```
268
269
### Spatial Autocorrelation Analysis
270
271
```python
272
# Calculate Moran's I for spatial autocorrelation
273
sc.metrics.morans_i(adata, n_perms=100)
274
275
# Get top spatially variable genes
276
spatial_genes = adata.var.sort_values('morans_i', ascending=False).index[:20]
277
278
# Visualize spatially variable genes
279
sci.pl.spatial(adata, color=spatial_genes[:6], ncols=3)
280
281
# Calculate Geary's C as alternative measure
282
sc.metrics.gearys_c(adata, genes=spatial_genes[:10])
283
284
# Compare spatial statistics
285
spatial_stats = adata.var[['morans_i', 'gearys_c']].dropna()
286
plt.scatter(spatial_stats['morans_i'], spatial_stats['gearys_c'])
287
plt.xlabel("Moran's I")
288
plt.ylabel("Geary's C")
289
plt.show()
290
```
291
292
### Spatial Domain Identification
293
294
```python
295
# Compute spatial neighborhood graph
296
sc.pp.spatial_neighbors(adata, coord_type='visium', n_rings=1)
297
298
# Identify spatial domains using Leiden clustering
299
sc.tl.leiden(adata, adjacency=adata.obsp['spatial_connectivities'],
300
key_added='spatial_domains', resolution=0.5)
301
302
# Visualize spatial domains
303
sc.pl.spatial(adata, color='spatial_domains', legend_loc='right margin')
304
305
# Find marker genes for spatial domains
306
sc.tl.rank_genes_groups(adata, 'spatial_domains', method='wilcoxon')
307
sc.pl.rank_genes_groups(adata, n_genes=5, sharey=False)
308
```
309
310
### Advanced Spatial Analysis
311
312
```python
313
# Calculate spatial gradients
314
import numpy as np
315
gradient_genes = ['VEGFA', 'HIF1A', 'EGFR']
316
sc.tl.spatial_gradient(adata, genes=gradient_genes)
317
318
# Visualize gradients
319
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
320
for i, gene in enumerate(gradient_genes):
321
# Original expression
322
sc.pl.spatial(adata, color=gene, ax=axes[0, i], show=False,
323
title=f'{gene} expression')
324
325
# Spatial gradient magnitude
326
gradient_key = f'{gene}_gradient_magnitude'
327
if gradient_key in adata.obs.columns:
328
sc.pl.spatial(adata, color=gradient_key, ax=axes[1, i], show=False,
329
title=f'{gene} gradient')
330
331
plt.tight_layout()
332
plt.show()
333
```
334
335
### Integration with Standard Analysis
336
337
```python
338
# Standard single-cell analysis on spatial data
339
sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)
340
adata.raw = adata
341
adata = adata[:, adata.var.highly_variable]
342
343
# PCA and neighbors
344
sc.pp.scale(adata)
345
sc.pp.pca(adata)
346
sc.pp.neighbors(adata)
347
sc.tl.umap(adata)
348
349
# Standard clustering
350
sc.tl.leiden(adata, resolution=0.5, key_added='expression_clusters')
351
352
# Compare expression-based and spatial clustering
353
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
354
355
# UMAP with expression clusters
356
sc.pl.umap(adata, color='expression_clusters', ax=axes[0], show=False,
357
title='Expression-based clusters')
358
359
# Spatial plot with expression clusters
360
sc.pl.spatial(adata, color='expression_clusters', ax=axes[1], show=False,
361
title='Expression clusters (spatial)')
362
363
# Spatial plot with spatial domains
364
sc.pl.spatial(adata, color='spatial_domains', ax=axes[2], show=False,
365
title='Spatial domains')
366
367
plt.show()
368
```
369
370
### Working with Multiple Samples
371
372
```python
373
# Load multiple Visium samples
374
samples = ['sample1', 'sample2', 'sample3']
375
adatas = {}
376
377
for sample in samples:
378
adata_sample = sc.read_visium(f'path/to/{sample}/')
379
adata_sample.obs['sample'] = sample
380
adatas[sample] = adata_sample
381
382
# Concatenate samples
383
adata_combined = sc.concat(adatas, label='sample', keys=samples)
384
385
# Spatial analysis across samples
386
sc.metrics.morans_i(adata_combined)
387
388
# Plot per sample
389
fig, axes = plt.subplots(1, len(samples), figsize=(5*len(samples), 5))
390
for i, sample in enumerate(samples):
391
adata_sample = adata_combined[adata_combined.obs['sample'] == sample]
392
sc.pl.spatial(adata_sample, color='total_counts', ax=axes[i],
393
show=False, title=sample)
394
plt.show()
395
```
396
397
### Quality Control for Spatial Data
398
399
```python
400
# Spatial-specific QC metrics
401
adata.var['n_spots'] = np.sum(adata.X > 0, axis=0).A1
402
adata.obs['n_genes'] = np.sum(adata.X > 0, axis=1).A1
403
404
# Visualize QC metrics spatially
405
sc.pl.spatial(adata, color=['total_counts', 'n_genes', 'pct_counts_mt'],
406
ncols=3)
407
408
# Filter spots based on spatial context
409
# Remove spots with very low counts that might be outside tissue
410
min_counts = np.percentile(adata.obs['total_counts'], 5)
411
adata = adata[adata.obs['total_counts'] > min_counts, :]
412
413
# Remove genes not expressed in enough spots
414
min_spots = 10
415
adata = adata[:, adata.var['n_spots'] >= min_spots]
416
```
417
418
## Spatial Data Formats
419
420
### Visium Data Structure
421
422
```python
423
# Visium data structure
424
adata.obsm['spatial'] # Spatial coordinates (array_row, array_col)
425
adata.uns['spatial'] # Spatial metadata and images
426
adata.uns['spatial'][library_id]['images'] # Tissue images
427
adata.uns['spatial'][library_id]['scalefactors'] # Scaling factors
428
```
429
430
### Generic Spatial Coordinates
431
432
```python
433
# For non-Visium data, store coordinates in obsm
434
import pandas as pd
435
coordinates = pd.DataFrame({
436
'x': x_coords,
437
'y': y_coords
438
})
439
adata.obsm['spatial'] = coordinates.values
440
```
441
442
## Best Practices
443
444
### Spatial Analysis Workflow
445
446
1. **Quality Control**: Remove low-quality spots and genes
447
2. **Normalization**: Account for spot-to-spot variation
448
3. **Spatial Statistics**: Identify spatially variable genes
449
4. **Domain Identification**: Find spatial domains/regions
450
5. **Integration**: Combine with standard scRNA-seq analysis
451
6. **Validation**: Confirm patterns with known biology
452
453
### Performance Considerations
454
455
- Spatial autocorrelation tests can be computationally intensive
456
- Use permutation tests judiciously (start with small n_perms)
457
- Consider subsetting genes for initial exploration
458
- Spatial neighbor graphs can be memory intensive for large datasets