or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

configuration.mdcore-data-structures.mddata-io.mdindex.mdquery-indexing.mdsingle-cell-biology.mdspatial-data.md

spatial-data.mddocs/

0

# Spatial Data Support

1

2

Experimental spatial data structures for storing and analyzing spatial single-cell data. These include geometry dataframes for complex shapes, point clouds for coordinate data, multiscale images for microscopy data, and spatial scenes for organizing spatial assets with shared coordinate systems.

3

4

**Note**: All spatial data types are marked as "Lifecycle: experimental" and may undergo significant changes.

5

6

## Capabilities

7

8

### GeometryDataFrame

9

10

A specialized DataFrame for storing complex geometries such as polygons, lines, and multipoints with spatial indexing capabilities. Designed for representing cell boundaries, tissue regions, and other complex spatial features.

11

12

```python { .api }

13

class GeometryDataFrame(DataFrame):

14

@classmethod

15

def create(cls, uri, *, schema, coordinate_space=("x", "y"), domain=None, platform_config=None, context=None, tiledb_timestamp=None):

16

"""

17

Create a new GeometryDataFrame.

18

19

Parameters:

20

- uri: str, URI for the geometry dataframe

21

- schema: pyarrow.Schema, column schema including soma_joinid and geometry columns

22

- coordinate_space: tuple of str, names of coordinate dimensions (default: ("x", "y"))

23

- domain: list of tuples, domain bounds for each dimension

24

- platform_config: TileDB-specific configuration options

25

- context: TileDB context for the operation

26

- tiledb_timestamp: Timestamp for temporal queries

27

28

Returns:

29

GeometryDataFrame instance

30

"""

31

```

32

33

The schema must include a geometry column containing spatial data in a format compatible with spatial operations.

34

35

#### Usage Example

36

37

```python

38

import tiledbsoma

39

import pyarrow as pa

40

import numpy as np

41

42

# Define schema for cell boundaries

43

geometry_schema = pa.schema([

44

("soma_joinid", pa.int64()),

45

("cell_id", pa.string()),

46

("soma_geometry", pa.binary()), # Geometry data (e.g., WKB format)

47

("area", pa.float64()),

48

("perimeter", pa.float64()),

49

("tissue_region", pa.string())

50

])

51

52

# Create geometry dataframe for cell boundaries

53

with tiledbsoma.GeometryDataFrame.create(

54

"cell_boundaries.soma",

55

schema=geometry_schema,

56

coordinate_space=("x", "y")

57

) as geom_df:

58

59

# Example polygon data (simplified)

60

geometry_data = pa.table({

61

"soma_joinid": [0, 1, 2],

62

"cell_id": ["cell_001", "cell_002", "cell_003"],

63

"soma_geometry": [b"polygon_wkb_data_1", b"polygon_wkb_data_2", b"polygon_wkb_data_3"],

64

"area": [25.5, 32.1, 28.7],

65

"perimeter": [18.2, 20.8, 19.5],

66

"tissue_region": ["cortex", "cortex", "hippocampus"]

67

})

68

geom_df.write(geometry_data)

69

70

# Query geometries by region

71

with tiledbsoma.open("cell_boundaries.soma") as geom_df:

72

cortex_cells = geom_df.read(

73

value_filter="tissue_region == 'cortex'",

74

column_names=["soma_joinid", "cell_id", "area"]

75

).concat()

76

print(cortex_cells.to_pandas())

77

```

78

79

### PointCloudDataFrame

80

81

A specialized DataFrame for storing point collections in multi-dimensional space with spatial indexing. Ideal for storing subcellular locations, molecular coordinates, and other point-based spatial data.

82

83

```python { .api }

84

class PointCloudDataFrame(DataFrame):

85

@classmethod

86

def create(cls, uri, *, schema, coordinate_space=("x", "y"), domain=None, platform_config=None, context=None, tiledb_timestamp=None):

87

"""

88

Create a new PointCloudDataFrame.

89

90

Parameters:

91

- uri: str, URI for the point cloud dataframe

92

- schema: pyarrow.Schema, column schema including soma_joinid and coordinate columns

93

- coordinate_space: tuple of str, names of coordinate dimensions (default: ("x", "y"))

94

- domain: list of tuples, domain bounds for each dimension

95

- platform_config: TileDB-specific configuration options

96

- context: TileDB context for the operation

97

- tiledb_timestamp: Timestamp for temporal queries

98

99

Returns:

100

PointCloudDataFrame instance

101

"""

102

```

103

104

The schema should include coordinate columns matching the coordinate_space specification.

105

106

#### Usage Example

107

108

```python

109

import tiledbsoma

110

import pyarrow as pa

111

import numpy as np

112

113

# Define schema for molecule coordinates

114

point_schema = pa.schema([

115

("soma_joinid", pa.int64()),

116

("x", pa.float64()), # X coordinate

117

("y", pa.float64()), # Y coordinate

118

("z", pa.float64()), # Z coordinate (optional)

119

("gene", pa.string()),

120

("cell_id", pa.string()),

121

("intensity", pa.float32())

122

])

123

124

# Create point cloud for single-molecule FISH data

125

with tiledbsoma.PointCloudDataFrame.create(

126

"molecule_locations.soma",

127

schema=point_schema,

128

coordinate_space=("x", "y", "z")

129

) as point_df:

130

131

# Generate synthetic molecule locations

132

n_molecules = 10000

133

np.random.seed(42)

134

135

molecule_data = pa.table({

136

"soma_joinid": range(n_molecules),

137

"x": np.random.uniform(0, 1000, n_molecules),

138

"y": np.random.uniform(0, 1000, n_molecules),

139

"z": np.random.uniform(0, 10, n_molecules),

140

"gene": np.random.choice(["GAPDH", "ACTB", "CD3D", "CD79A"], n_molecules),

141

"cell_id": [f"cell_{i//50}" for i in range(n_molecules)],

142

"intensity": np.random.exponential(100, n_molecules)

143

})

144

point_df.write(molecule_data)

145

146

# Query molecules by gene and spatial region

147

with tiledbsoma.open("molecule_locations.soma") as point_df:

148

# Find GAPDH molecules in specific region

149

gapdh_molecules = point_df.read(

150

value_filter="gene == 'GAPDH' and x >= 100 and x <= 200 and y >= 100 and y <= 200",

151

column_names=["x", "y", "z", "intensity"]

152

).concat()

153

print(f"GAPDH molecules in region: {len(gapdh_molecules)}")

154

```

155

156

### MultiscaleImage

157

158

A Collection of images at multiple resolution levels with consistent channels and axis order. Designed for storing and accessing microscopy data at different scales, enabling efficient visualization and analysis of large images.

159

160

```python { .api }

161

class MultiscaleImage(Collection):

162

@classmethod

163

def create(cls, uri, *, type, reference_level_shape, axis_names=("c", "y", "x"), coordinate_space=None, platform_config=None, context=None, tiledb_timestamp=None):

164

"""

165

Create a new MultiscaleImage.

166

167

Parameters:

168

- uri: str, URI for the multiscale image

169

- type: pyarrow data type for image pixels

170

- reference_level_shape: tuple of int, shape of the highest resolution level

171

- axis_names: tuple of str, names for image axes (default: ("c", "y", "x"))

172

- coordinate_space: coordinate space specification (optional)

173

- platform_config: TileDB-specific configuration options

174

- context: TileDB context for the operation

175

- tiledb_timestamp: Timestamp for temporal queries

176

177

Returns:

178

MultiscaleImage instance

179

"""

180

181

def levels(self):

182

"""

183

Get available resolution levels.

184

185

Returns:

186

list of str: Level names (e.g., ["0", "1", "2"])

187

"""

188

189

def level_shape(self, level):

190

"""

191

Get shape of specific resolution level.

192

193

Parameters:

194

- level: str, level name

195

196

Returns:

197

tuple of int: Shape of the specified level

198

"""

199

```

200

201

#### Usage Example

202

203

```python

204

import tiledbsoma

205

import pyarrow as pa

206

import numpy as np

207

208

# Create multiscale image for microscopy data

209

with tiledbsoma.MultiscaleImage.create(

210

"tissue_image.soma",

211

type=pa.uint16(),

212

reference_level_shape=(3, 2048, 2048), # 3 channels, 2048x2048 pixels

213

axis_names=("c", "y", "x")

214

) as ms_image:

215

216

# Add multiple resolution levels

217

# Level 0: Full resolution

218

level_0 = ms_image.add_new_dense_ndarray(

219

"0",

220

type=pa.uint16(),

221

shape=(3, 2048, 2048)

222

)

223

224

# Level 1: Half resolution

225

level_1 = ms_image.add_new_dense_ndarray(

226

"1",

227

type=pa.uint16(),

228

shape=(3, 1024, 1024)

229

)

230

231

# Level 2: Quarter resolution

232

level_2 = ms_image.add_new_dense_ndarray(

233

"2",

234

type=pa.uint16(),

235

shape=(3, 512, 512)

236

)

237

238

# Access different resolution levels

239

with tiledbsoma.open("tissue_image.soma") as ms_image:

240

print(f"Available levels: {list(ms_image.keys())}")

241

242

# Read low-resolution version for overview

243

low_res = ms_image["2"].read().to_numpy()

244

print(f"Low resolution shape: {low_res.shape}")

245

246

# Read high-resolution region of interest

247

roi = ms_image["0"].read(coords=(slice(None), slice(500, 600), slice(500, 600)))

248

print(f"High-res ROI shape: {roi.to_numpy().shape}")

249

```

250

251

### Scene

252

253

A Collection that organizes spatial assets sharing a coordinate space. Scenes group related spatial data including images, observation locations, and variable locations, providing a unified coordinate system for spatial analysis.

254

255

```python { .api }

256

class Scene(Collection):

257

img: Collection # Image collection (MultiscaleImage objects)

258

obsl: Collection # Observation location collection (PointCloudDataFrame, GeometryDataFrame)

259

varl: Collection # Variable location collection (spatial features)

260

261

@classmethod

262

def create(cls, uri, *, coordinate_space=None, platform_config=None, context=None, tiledb_timestamp=None):

263

"""

264

Create a new Scene.

265

266

Parameters:

267

- uri: str, URI for the scene

268

- coordinate_space: coordinate space specification defining spatial reference

269

- platform_config: TileDB-specific configuration options

270

- context: TileDB context for the operation

271

- tiledb_timestamp: Timestamp for temporal queries

272

273

Returns:

274

Scene instance

275

"""

276

```

277

278

#### Usage Example

279

280

```python

281

import tiledbsoma

282

import pyarrow as pa

283

284

# Create a spatial scene for tissue analysis

285

with tiledbsoma.Scene.create("tissue_scene.soma") as scene:

286

# Add image collection

287

scene.add_new_collection("img")

288

289

# Add observation locations (cell centers)

290

scene.add_new_collection("obsl")

291

292

# Add variable locations (gene expression locations)

293

scene.add_new_collection("varl")

294

295

# Add H&E staining image

296

he_image = scene.img.add_new_multiscale_image(

297

"HE_stain",

298

type=pa.uint8(),

299

reference_level_shape=(3, 4096, 4096),

300

axis_names=("c", "y", "x")

301

)

302

303

# Add cell center locations

304

cell_schema = pa.schema([

305

("soma_joinid", pa.int64()),

306

("x", pa.float64()),

307

("y", pa.float64()),

308

("cell_type", pa.string())

309

])

310

311

cell_locations = scene.obsl.add_new_point_cloud_dataframe(

312

"cell_centers",

313

schema=cell_schema,

314

coordinate_space=("x", "y")

315

)

316

317

# Access scene components

318

with tiledbsoma.open("tissue_scene.soma") as scene:

319

# Access H&E image

320

he_stain = scene.img["HE_stain"]

321

image_data = he_stain["0"].read(coords=(slice(None), slice(0, 500), slice(0, 500)))

322

323

# Access cell locations overlapping with image region

324

cell_centers = scene.obsl["cell_centers"]

325

cells_in_region = cell_centers.read(

326

value_filter="x >= 0 and x <= 500 and y >= 0 and y <= 500"

327

).concat()

328

329

print(f"Cells in image region: {len(cells_in_region)}")

330

```

331

332

## Coordinate Systems and Transformations

333

334

Spatial data types support coordinate system definitions and transformations for aligning data from different sources.

335

336

```python { .api }

337

# Coordinate system types (imported from somacore)

338

class CoordinateSpace:

339

"""Defines coordinate space for spatial data"""

340

341

class AffineTransform:

342

"""Affine coordinate transformation matrix"""

343

344

class IdentityTransform:

345

"""Identity transformation (no change)"""

346

347

class ScaleTransform:

348

"""Scale transformation with per-axis scaling factors"""

349

350

class UniformScaleTransform:

351

"""Uniform scaling transformation"""

352

```

353

354

### Usage Example

355

356

```python

357

import tiledbsoma

358

from tiledbsoma import CoordinateSpace, AffineTransform

359

360

# Define coordinate space with transformation

361

coord_space = CoordinateSpace([

362

("x", (0.0, 1000.0)), # X axis: 0-1000 microns

363

("y", (0.0, 1000.0)) # Y axis: 0-1000 microns

364

])

365

366

# Create geometry dataframe with coordinate space

367

with tiledbsoma.GeometryDataFrame.create(

368

"cells_with_coords.soma",

369

schema=cell_schema,

370

coordinate_space=("x", "y")

371

) as geom_df:

372

# Data is stored in the defined coordinate space

373

pass

374

```

375

376

## Integration with Spatial Analysis

377

378

The spatial data types are designed to integrate with spatial analysis workflows:

379

380

```python

381

import tiledbsoma

382

383

# Load spatial experiment

384

with tiledbsoma.open("spatial_experiment.soma") as exp:

385

# Access spatial scene

386

scene = exp.spatial["tissue_section_1"]

387

388

# Get cell locations and expression data

389

cell_locations = scene.obsl["cell_centers"]

390

rna_data = exp.ms["RNA"]

391

392

# Spatial analysis workflow:

393

# 1. Load cell coordinates

394

coords = cell_locations.read().concat().to_pandas()

395

396

# 2. Load expression data for same cells

397

query = exp.axis_query("RNA")

398

expression = query.to_anndata()

399

400

# 3. Combine for spatial analysis

401

# (e.g., spatial statistics, neighborhood analysis)

402

```

403

404

This spatial data support enables TileDB-SOMA to handle complex spatial single-cell datasets including spatial transcriptomics, spatial proteomics, and multiplexed imaging data.