0
# Spatial Data Support
1
2
Experimental spatial data structures for storing and analyzing spatial single-cell data. These include geometry dataframes for complex shapes, point clouds for coordinate data, multiscale images for microscopy data, and spatial scenes for organizing spatial assets with shared coordinate systems.
3
4
**Note**: All spatial data types are marked as "Lifecycle: experimental" and may undergo significant changes.
5
6
## Capabilities
7
8
### GeometryDataFrame
9
10
A specialized DataFrame for storing complex geometries such as polygons, lines, and multipoints with spatial indexing capabilities. Designed for representing cell boundaries, tissue regions, and other complex spatial features.
11
12
```python { .api }
13
class GeometryDataFrame(DataFrame):
14
@classmethod
15
def create(cls, uri, *, schema, coordinate_space=("x", "y"), domain=None, platform_config=None, context=None, tiledb_timestamp=None):
16
"""
17
Create a new GeometryDataFrame.
18
19
Parameters:
20
- uri: str, URI for the geometry dataframe
21
- schema: pyarrow.Schema, column schema including soma_joinid and geometry columns
22
- coordinate_space: tuple of str, names of coordinate dimensions (default: ("x", "y"))
23
- domain: list of tuples, domain bounds for each dimension
24
- platform_config: TileDB-specific configuration options
25
- context: TileDB context for the operation
26
- tiledb_timestamp: Timestamp for temporal queries
27
28
Returns:
29
GeometryDataFrame instance
30
"""
31
```
32
33
The schema must include a geometry column containing spatial data in a format compatible with spatial operations.
34
35
#### Usage Example
36
37
```python
38
import tiledbsoma
39
import pyarrow as pa
40
import numpy as np
41
42
# Define schema for cell boundaries
43
geometry_schema = pa.schema([
44
("soma_joinid", pa.int64()),
45
("cell_id", pa.string()),
46
("soma_geometry", pa.binary()), # Geometry data (e.g., WKB format)
47
("area", pa.float64()),
48
("perimeter", pa.float64()),
49
("tissue_region", pa.string())
50
])
51
52
# Create geometry dataframe for cell boundaries
53
with tiledbsoma.GeometryDataFrame.create(
54
"cell_boundaries.soma",
55
schema=geometry_schema,
56
coordinate_space=("x", "y")
57
) as geom_df:
58
59
# Example polygon data (simplified)
60
geometry_data = pa.table({
61
"soma_joinid": [0, 1, 2],
62
"cell_id": ["cell_001", "cell_002", "cell_003"],
63
"soma_geometry": [b"polygon_wkb_data_1", b"polygon_wkb_data_2", b"polygon_wkb_data_3"],
64
"area": [25.5, 32.1, 28.7],
65
"perimeter": [18.2, 20.8, 19.5],
66
"tissue_region": ["cortex", "cortex", "hippocampus"]
67
})
68
geom_df.write(geometry_data)
69
70
# Query geometries by region
71
with tiledbsoma.open("cell_boundaries.soma") as geom_df:
72
cortex_cells = geom_df.read(
73
value_filter="tissue_region == 'cortex'",
74
column_names=["soma_joinid", "cell_id", "area"]
75
).concat()
76
print(cortex_cells.to_pandas())
77
```
78
79
### PointCloudDataFrame
80
81
A specialized DataFrame for storing point collections in multi-dimensional space with spatial indexing. Ideal for storing subcellular locations, molecular coordinates, and other point-based spatial data.
82
83
```python { .api }
84
class PointCloudDataFrame(DataFrame):
85
@classmethod
86
def create(cls, uri, *, schema, coordinate_space=("x", "y"), domain=None, platform_config=None, context=None, tiledb_timestamp=None):
87
"""
88
Create a new PointCloudDataFrame.
89
90
Parameters:
91
- uri: str, URI for the point cloud dataframe
92
- schema: pyarrow.Schema, column schema including soma_joinid and coordinate columns
93
- coordinate_space: tuple of str, names of coordinate dimensions (default: ("x", "y"))
94
- domain: list of tuples, domain bounds for each dimension
95
- platform_config: TileDB-specific configuration options
96
- context: TileDB context for the operation
97
- tiledb_timestamp: Timestamp for temporal queries
98
99
Returns:
100
PointCloudDataFrame instance
101
"""
102
```
103
104
The schema should include coordinate columns matching the coordinate_space specification.
105
106
#### Usage Example
107
108
```python
109
import tiledbsoma
110
import pyarrow as pa
111
import numpy as np
112
113
# Define schema for molecule coordinates
114
point_schema = pa.schema([
115
("soma_joinid", pa.int64()),
116
("x", pa.float64()), # X coordinate
117
("y", pa.float64()), # Y coordinate
118
("z", pa.float64()), # Z coordinate (optional)
119
("gene", pa.string()),
120
("cell_id", pa.string()),
121
("intensity", pa.float32())
122
])
123
124
# Create point cloud for single-molecule FISH data
125
with tiledbsoma.PointCloudDataFrame.create(
126
"molecule_locations.soma",
127
schema=point_schema,
128
coordinate_space=("x", "y", "z")
129
) as point_df:
130
131
# Generate synthetic molecule locations
132
n_molecules = 10000
133
np.random.seed(42)
134
135
molecule_data = pa.table({
136
"soma_joinid": range(n_molecules),
137
"x": np.random.uniform(0, 1000, n_molecules),
138
"y": np.random.uniform(0, 1000, n_molecules),
139
"z": np.random.uniform(0, 10, n_molecules),
140
"gene": np.random.choice(["GAPDH", "ACTB", "CD3D", "CD79A"], n_molecules),
141
"cell_id": [f"cell_{i//50}" for i in range(n_molecules)],
142
"intensity": np.random.exponential(100, n_molecules)
143
})
144
point_df.write(molecule_data)
145
146
# Query molecules by gene and spatial region
147
with tiledbsoma.open("molecule_locations.soma") as point_df:
148
# Find GAPDH molecules in specific region
149
gapdh_molecules = point_df.read(
150
value_filter="gene == 'GAPDH' and x >= 100 and x <= 200 and y >= 100 and y <= 200",
151
column_names=["x", "y", "z", "intensity"]
152
).concat()
153
print(f"GAPDH molecules in region: {len(gapdh_molecules)}")
154
```
155
156
### MultiscaleImage
157
158
A Collection of images at multiple resolution levels with consistent channels and axis order. Designed for storing and accessing microscopy data at different scales, enabling efficient visualization and analysis of large images.
159
160
```python { .api }
161
class MultiscaleImage(Collection):
162
@classmethod
163
def create(cls, uri, *, type, reference_level_shape, axis_names=("c", "y", "x"), coordinate_space=None, platform_config=None, context=None, tiledb_timestamp=None):
164
"""
165
Create a new MultiscaleImage.
166
167
Parameters:
168
- uri: str, URI for the multiscale image
169
- type: pyarrow data type for image pixels
170
- reference_level_shape: tuple of int, shape of the highest resolution level
171
- axis_names: tuple of str, names for image axes (default: ("c", "y", "x"))
172
- coordinate_space: coordinate space specification (optional)
173
- platform_config: TileDB-specific configuration options
174
- context: TileDB context for the operation
175
- tiledb_timestamp: Timestamp for temporal queries
176
177
Returns:
178
MultiscaleImage instance
179
"""
180
181
def levels(self):
182
"""
183
Get available resolution levels.
184
185
Returns:
186
list of str: Level names (e.g., ["0", "1", "2"])
187
"""
188
189
def level_shape(self, level):
190
"""
191
Get shape of specific resolution level.
192
193
Parameters:
194
- level: str, level name
195
196
Returns:
197
tuple of int: Shape of the specified level
198
"""
199
```
200
201
#### Usage Example
202
203
```python
204
import tiledbsoma
205
import pyarrow as pa
206
import numpy as np
207
208
# Create multiscale image for microscopy data
209
with tiledbsoma.MultiscaleImage.create(
210
"tissue_image.soma",
211
type=pa.uint16(),
212
reference_level_shape=(3, 2048, 2048), # 3 channels, 2048x2048 pixels
213
axis_names=("c", "y", "x")
214
) as ms_image:
215
216
# Add multiple resolution levels
217
# Level 0: Full resolution
218
level_0 = ms_image.add_new_dense_ndarray(
219
"0",
220
type=pa.uint16(),
221
shape=(3, 2048, 2048)
222
)
223
224
# Level 1: Half resolution
225
level_1 = ms_image.add_new_dense_ndarray(
226
"1",
227
type=pa.uint16(),
228
shape=(3, 1024, 1024)
229
)
230
231
# Level 2: Quarter resolution
232
level_2 = ms_image.add_new_dense_ndarray(
233
"2",
234
type=pa.uint16(),
235
shape=(3, 512, 512)
236
)
237
238
# Access different resolution levels
239
with tiledbsoma.open("tissue_image.soma") as ms_image:
240
print(f"Available levels: {list(ms_image.keys())}")
241
242
# Read low-resolution version for overview
243
low_res = ms_image["2"].read().to_numpy()
244
print(f"Low resolution shape: {low_res.shape}")
245
246
# Read high-resolution region of interest
247
roi = ms_image["0"].read(coords=(slice(None), slice(500, 600), slice(500, 600)))
248
print(f"High-res ROI shape: {roi.to_numpy().shape}")
249
```
250
251
### Scene
252
253
A Collection that organizes spatial assets sharing a coordinate space. Scenes group related spatial data including images, observation locations, and variable locations, providing a unified coordinate system for spatial analysis.
254
255
```python { .api }
256
class Scene(Collection):
257
img: Collection # Image collection (MultiscaleImage objects)
258
obsl: Collection # Observation location collection (PointCloudDataFrame, GeometryDataFrame)
259
varl: Collection # Variable location collection (spatial features)
260
261
@classmethod
262
def create(cls, uri, *, coordinate_space=None, platform_config=None, context=None, tiledb_timestamp=None):
263
"""
264
Create a new Scene.
265
266
Parameters:
267
- uri: str, URI for the scene
268
- coordinate_space: coordinate space specification defining spatial reference
269
- platform_config: TileDB-specific configuration options
270
- context: TileDB context for the operation
271
- tiledb_timestamp: Timestamp for temporal queries
272
273
Returns:
274
Scene instance
275
"""
276
```
277
278
#### Usage Example
279
280
```python
281
import tiledbsoma
282
import pyarrow as pa
283
284
# Create a spatial scene for tissue analysis
285
with tiledbsoma.Scene.create("tissue_scene.soma") as scene:
286
# Add image collection
287
scene.add_new_collection("img")
288
289
# Add observation locations (cell centers)
290
scene.add_new_collection("obsl")
291
292
# Add variable locations (gene expression locations)
293
scene.add_new_collection("varl")
294
295
# Add H&E staining image
296
he_image = scene.img.add_new_multiscale_image(
297
"HE_stain",
298
type=pa.uint8(),
299
reference_level_shape=(3, 4096, 4096),
300
axis_names=("c", "y", "x")
301
)
302
303
# Add cell center locations
304
cell_schema = pa.schema([
305
("soma_joinid", pa.int64()),
306
("x", pa.float64()),
307
("y", pa.float64()),
308
("cell_type", pa.string())
309
])
310
311
cell_locations = scene.obsl.add_new_point_cloud_dataframe(
312
"cell_centers",
313
schema=cell_schema,
314
coordinate_space=("x", "y")
315
)
316
317
# Access scene components
318
with tiledbsoma.open("tissue_scene.soma") as scene:
319
# Access H&E image
320
he_stain = scene.img["HE_stain"]
321
image_data = he_stain["0"].read(coords=(slice(None), slice(0, 500), slice(0, 500)))
322
323
# Access cell locations overlapping with image region
324
cell_centers = scene.obsl["cell_centers"]
325
cells_in_region = cell_centers.read(
326
value_filter="x >= 0 and x <= 500 and y >= 0 and y <= 500"
327
).concat()
328
329
print(f"Cells in image region: {len(cells_in_region)}")
330
```
331
332
## Coordinate Systems and Transformations
333
334
Spatial data types support coordinate system definitions and transformations for aligning data from different sources.
335
336
```python { .api }
337
# Coordinate system types (imported from somacore)
338
class CoordinateSpace:
339
"""Defines coordinate space for spatial data"""
340
341
class AffineTransform:
342
"""Affine coordinate transformation matrix"""
343
344
class IdentityTransform:
345
"""Identity transformation (no change)"""
346
347
class ScaleTransform:
348
"""Scale transformation with per-axis scaling factors"""
349
350
class UniformScaleTransform:
351
"""Uniform scaling transformation"""
352
```
353
354
### Usage Example
355
356
```python
357
import tiledbsoma
358
from tiledbsoma import CoordinateSpace, AffineTransform
359
360
# Define coordinate space with transformation
361
coord_space = CoordinateSpace([
362
("x", (0.0, 1000.0)), # X axis: 0-1000 microns
363
("y", (0.0, 1000.0)) # Y axis: 0-1000 microns
364
])
365
366
# Create geometry dataframe with coordinate space
367
with tiledbsoma.GeometryDataFrame.create(
368
"cells_with_coords.soma",
369
schema=cell_schema,
370
coordinate_space=("x", "y")
371
) as geom_df:
372
# Data is stored in the defined coordinate space
373
pass
374
```
375
376
## Integration with Spatial Analysis
377
378
The spatial data types are designed to integrate with spatial analysis workflows:
379
380
```python
381
import tiledbsoma
382
383
# Load spatial experiment
384
with tiledbsoma.open("spatial_experiment.soma") as exp:
385
# Access spatial scene
386
scene = exp.spatial["tissue_section_1"]
387
388
# Get cell locations and expression data
389
cell_locations = scene.obsl["cell_centers"]
390
rna_data = exp.ms["RNA"]
391
392
# Spatial analysis workflow:
393
# 1. Load cell coordinates
394
coords = cell_locations.read().concat().to_pandas()
395
396
# 2. Load expression data for same cells
397
query = exp.axis_query("RNA")
398
expression = query.to_anndata()
399
400
# 3. Combine for spatial analysis
401
# (e.g., spatial statistics, neighborhood analysis)
402
```
403
404
This spatial data support enables TileDB-SOMA to handle complex spatial single-cell datasets including spatial transcriptomics, spatial proteomics, and multiplexed imaging data.