or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-tiledbsoma

Python API for efficient storage and retrieval of single-cell data using TileDB

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/tiledbsoma@1.17.x

To install, run

npx @tessl/cli install tessl/pypi-tiledbsoma@1.17.0

0

# TileDB-SOMA

1

2

A Python implementation of the SOMA (Stack of Matrices, Annotated) API using TileDB Embedded for efficient storage and retrieval of single-cell data. TileDB-SOMA provides scalable data structures for storing and querying larger-than-memory datasets in both cloud and local systems, with specialized support for single-cell biology workflows.

3

4

## Package Information

5

6

- **Package Name**: tiledbsoma

7

- **Language**: Python

8

- **Installation**: `pip install tiledbsoma`

9

- **Version**: 1.17.1

10

11

## Core Imports

12

13

```python

14

import tiledbsoma

15

```

16

17

Common patterns for data structures:

18

19

```python

20

from tiledbsoma import (

21

Collection, DataFrame, SparseNDArray, DenseNDArray,

22

Experiment, Measurement, open

23

)

24

```

25

26

For I/O operations:

27

28

```python

29

import tiledbsoma.io as soma_io

30

```

31

32

## Basic Usage

33

34

```python

35

import tiledbsoma

36

import numpy as np

37

import pyarrow as pa

38

39

# Create a DataFrame with single-cell observations

40

schema = pa.schema([

41

("soma_joinid", pa.int64()),

42

("cell_type", pa.string()),

43

("tissue", pa.string()),

44

("donor_id", pa.string())

45

])

46

47

# Create and write data

48

with tiledbsoma.DataFrame.create("obs.soma", schema=schema) as obs_df:

49

data = pa.table({

50

"soma_joinid": [0, 1, 2, 3],

51

"cell_type": ["T-cell", "B-cell", "Neuron", "Astrocyte"],

52

"tissue": ["blood", "blood", "brain", "brain"],

53

"donor_id": ["D1", "D1", "D2", "D2"]

54

})

55

obs_df.write(data)

56

57

# Read data back

58

with tiledbsoma.open("obs.soma") as obs_df:

59

data = obs_df.read().concat()

60

print(data.to_pandas())

61

62

# Create a sparse matrix for gene expression data

63

with tiledbsoma.SparseNDArray.create(

64

"X.soma",

65

type=pa.float32(),

66

shape=(1000, 2000) # 1000 cells, 2000 genes

67

) as X_array:

68

# Write sparse data (cell_id, gene_id, expression_value)

69

coordinates = pa.table({

70

"soma_dim_0": [0, 0, 1, 1, 2], # cell indices

71

"soma_dim_1": [5, 100, 5, 200, 300], # gene indices

72

})

73

values = pa.table({

74

"soma_data": [1.5, 2.3, 0.8, 3.1, 1.2] # expression values

75

})

76

X_array.write((coordinates, values))

77

```

78

79

## Architecture

80

81

TileDB-SOMA follows a hierarchical object model designed for single-cell data analysis:

82

83

- **Collections**: String-keyed containers that can hold any SOMA object type

84

- **Arrays**: Multi-dimensional arrays (sparse/dense) for numerical data with TileDB storage

85

- **DataFrames**: Tabular data with Arrow schemas, requiring `soma_joinid` column

86

- **Experiments**: Specialized collections representing annotated measurement matrices

87

- **Measurements**: Collections grouping observations with measurements on annotated variables

88

89

The library uses Apache Arrow for in-memory data representation and TileDB for persistent storage, enabling efficient operations on larger-than-memory datasets with support for cloud storage backends.

90

91

## Capabilities

92

93

### Core Data Structures

94

95

Fundamental SOMA data types including Collections for hierarchical organization, DataFrames for tabular data, and sparse/dense N-dimensional arrays for numerical data storage.

96

97

```python { .api }

98

class Collection:

99

@classmethod

100

def create(cls, uri, *, platform_config=None, context=None, tiledb_timestamp=None): ...

101

def add_new_collection(self, key, **kwargs): ...

102

def add_new_dataframe(self, key, **kwargs): ...

103

104

class DataFrame:

105

@classmethod

106

def create(cls, uri, *, schema, domain=None, platform_config=None, context=None, tiledb_timestamp=None): ...

107

def read(self, coords=(), value_filter=None, column_names=None, result_order=None, batch_size=None, partitions=None, platform_config=None): ...

108

def write(self, values, platform_config=None): ...

109

110

class SparseNDArray:

111

@classmethod

112

def create(cls, uri, *, type, shape, platform_config=None, context=None, tiledb_timestamp=None): ...

113

def read(self, coords=(), result_order=None, batch_size=None, partitions=None, platform_config=None): ...

114

def write(self, values, platform_config=None): ...

115

116

class DenseNDArray:

117

@classmethod

118

def create(cls, uri, *, type, shape, platform_config=None, context=None, tiledb_timestamp=None): ...

119

def read(self, coords=(), result_order=None, batch_size=None, partitions=None, platform_config=None): ...

120

def write(self, coords, values, platform_config=None): ...

121

```

122

123

[Core Data Structures](./core-data-structures.md)

124

125

### Single-Cell Biology Support

126

127

Specialized data structures for single-cell analysis including Experiments for annotated measurement matrices and Measurements for grouping observations with variables.

128

129

```python { .api }

130

class Experiment(Collection):

131

obs: DataFrame # Primary observations annotations

132

ms: Collection # Named measurements collection

133

spatial: Collection # Spatial scenes collection

134

def axis_query(self, measurement_name, *, obs_query=None, var_query=None): ...

135

136

class Measurement(Collection):

137

var: DataFrame # Variable annotations

138

X: Collection[SparseNDArray] # Feature values matrices

139

obsm: Collection[DenseNDArray] # Dense observation annotations

140

obsp: Collection[SparseNDArray] # Sparse pairwise observation annotations

141

```

142

143

[Single-Cell Biology](./single-cell-biology.md)

144

145

### Spatial Data Support

146

147

Experimental spatial data structures for storing and analyzing spatial single-cell data, including geometry dataframes, point clouds, multiscale images, and spatial scenes.

148

149

```python { .api }

150

class GeometryDataFrame(DataFrame):

151

@classmethod

152

def create(cls, uri, *, schema, coordinate_space=("x", "y"), domain=None, platform_config=None, context=None, tiledb_timestamp=None): ...

153

154

class PointCloudDataFrame(DataFrame):

155

@classmethod

156

def create(cls, uri, *, schema, coordinate_space=("x", "y"), domain=None, platform_config=None, context=None, tiledb_timestamp=None): ...

157

158

class Scene(Collection):

159

img: Collection # Image collection

160

obsl: Collection # Observation location collection

161

varl: Collection # Variable location collection

162

```

163

164

[Spatial Data](./spatial-data.md)

165

166

### Data I/O Operations

167

168

Comprehensive ingestion and outgestion functions for converting between SOMA format and popular single-cell data formats like AnnData and H5AD files.

169

170

```python { .api }

171

def from_anndata(anndata, uri, *, measurement_name="RNA", obs_id_name="obs_id", var_id_name="var_id", X_layer_name=None, obsm_layers=None, varm_layers=None, obsp_layers=None, varp_layers=None, uns_keys=None, ingest_mode="write", registration_mapping=None, context=None, platform_config=None, additional_metadata=None): ...

172

173

def to_anndata(experiment, *, measurement_name="RNA", X_layer_name=None, obsm_layers=None, varm_layers=None, obsp_layers=None, varp_layers=None, obs_coords=None, var_coords=None, obs_value_filter=None, var_value_filter=None, obs_column_names=None, var_column_names=None, batch_size=None, context=None): ...

174

175

def from_h5ad(h5ad_file_path, output_path, *, measurement_name="RNA", ...): ...

176

```

177

178

[Data I/O](./data-io.md)

179

180

### Registration System

181

182

ID mapping utilities for multi-file append-mode ingestion, supporting soma_joinid remapping and string-to-integer label mapping across multiple input files.

183

184

```python { .api }

185

class AxisAmbientLabelMapping:

186

def __init__(self, *, field_name: str, joinid_map: pd.DataFrame, enum_values: dict):

187

"""

188

Tracks mapping of input data ID-column names to SOMA join IDs.

189

190

Parameters:

191

- field_name: str, name of the ID column

192

- joinid_map: pd.DataFrame, mapping from ID to soma_joinid

193

- enum_values: dict, categorical type mappings

194

"""

195

196

class ExperimentAmbientLabelMapping:

197

obs: AxisAmbientLabelMapping # Observation ID mappings

198

var: dict[str, AxisAmbientLabelMapping] # Variable ID mappings per measurement

199

200

class AxisIDMapping:

201

def __init__(self, id_map: dict[int, int]):

202

"""

203

Offset-to-joinid mappings for individual input files.

204

205

Parameters:

206

- id_map: dict, mapping from input offsets to SOMA join IDs

207

"""

208

209

class ExperimentIDMapping:

210

obs: AxisIDMapping # Observation ID mapping

211

var: dict[str, AxisIDMapping] # Variable ID mappings per measurement

212

213

def get_dataframe_values(df: DataFrame, *, ids: npt.NDArray[np.int64], col_name: str):

214

"""Get values from DataFrame for specified IDs and column"""

215

```

216

217

### Query and Indexing

218

219

Query builders and indexing utilities for efficient data retrieval from SOMA objects, including experiment axis queries and integer indexing.

220

221

```python { .api }

222

class ExperimentAxisQuery:

223

def obs(self, *, column_names=None, batch_size=None, partitions=None, platform_config=None): ...

224

def var(self, *, column_names=None, batch_size=None, partitions=None, platform_config=None): ...

225

def X(self, layer_name, *, batch_size=None, partitions=None, platform_config=None): ...

226

def to_anndata(self, *, X_layer_name=None, column_names=None, obsm_layers=None, varm_layers=None, obsp_layers=None, varp_layers=None): ...

227

228

class IntIndexer:

229

def __init__(self, data, *, context=None): ...

230

def get_indexer(self, target): ...

231

```

232

233

[Query and Indexing](./query-indexing.md)

234

235

### Query Filtering

236

237

Advanced query condition system for attribute filtering with support for complex Boolean expressions and membership operations.

238

239

```python { .api }

240

class QueryCondition:

241

def __init__(self, expression: str):

242

"""

243

Create a query condition for filtering SOMA objects.

244

245

Parameters:

246

- expression: str, Boolean expression using TileDB query syntax

247

248

Supports:

249

- Comparison operators: <, >, <=, >=, ==, !=

250

- Boolean operators: and, or, &, |

251

- Membership operator: in

252

- Attribute casting: attr("column_name")

253

- Value casting: val(value)

254

"""

255

256

def init_query_condition(self, schema, query_attrs):

257

"""Initialize the query condition with schema and attributes"""

258

```

259

260

### Configuration and Options

261

262

Configuration classes for TileDB context management and platform-specific options for creating and writing SOMA objects.

263

264

```python { .api }

265

class SOMATileDBContext:

266

def __init__(self, config=None): ...

267

268

class TileDBCreateOptions:

269

def __init__(self, **kwargs): ...

270

271

class TileDBWriteOptions:

272

def __init__(self, **kwargs): ...

273

```

274

275

[Configuration](./configuration.md)

276

277

## Coordinate System Types

278

279

```python { .api }

280

class CoordinateSpace:

281

"""Defines coordinate space for spatial data"""

282

283

class AffineTransform:

284

"""Affine coordinate transformation"""

285

286

class IdentityTransform:

287

"""Identity coordinate transformation"""

288

289

class ScaleTransform:

290

"""Scale coordinate transformation"""

291

292

class UniformScaleTransform:

293

"""Uniform scale coordinate transformation"""

294

```

295

296

## Core Constants

297

298

```python { .api }

299

SOMA_JOINID: str = "soma_joinid" # Required DataFrame column name

300

```

301

302

## Exception Types

303

304

```python { .api }

305

class SOMAError(Exception):

306

"""Base exception class for all SOMA-specific errors"""

307

308

class DoesNotExistError(SOMAError):

309

"""Raised when requested SOMA object does not exist"""

310

311

class AlreadyExistsError(SOMAError):

312

"""Raised when attempting to create object that already exists"""

313

314

class NotCreateableError(SOMAError):

315

"""Raised when object cannot be created"""

316

```

317

318

## Utility Functions

319

320

```python { .api }

321

def open(uri, mode="r", *, soma_type=None, context=None, tiledb_timestamp=None):

322

"""Opens any SOMA object at URI"""

323

324

def get_implementation() -> str:

325

"""Returns implementation name ('python-tiledb')"""

326

327

def get_implementation_version() -> str:

328

"""Returns package version"""

329

330

def show_package_versions() -> None:

331

"""Prints version information for all dependencies"""

332

```

333

334

## Statistics and Logging

335

336

```python { .api }

337

def tiledbsoma_stats_json() -> str:

338

"""Return TileDB-SOMA statistics as JSON string"""

339

340

def tiledbsoma_stats_as_py() -> list:

341

"""Return TileDB-SOMA statistics as Python objects"""

342

343

def tiledbsoma_stats_enable() -> None:

344

"""Enable TileDB statistics collection"""

345

346

def tiledbsoma_stats_disable() -> None:

347

"""Disable TileDB statistics collection"""

348

349

def tiledbsoma_stats_reset() -> None:

350

"""Reset TileDB statistics"""

351

352

def tiledbsoma_stats_dump() -> None:

353

"""Dump TileDB statistics to stdout"""

354

```

355

356

## Logging Configuration

357

358

```python { .api }

359

import tiledbsoma.logging

360

361

def warning() -> None:

362

"""Set logging level to WARNING"""

363

364

def info() -> None:

365

"""Set logging level to INFO with progress indicators"""

366

367

def debug() -> None:

368

"""Set logging level to DEBUG with detailed progress"""

369

370

def log_io_same(message: str) -> None:

371

"""Log message to both INFO and DEBUG levels"""

372

373

def log_io(info_message: str | None, debug_message: str) -> None:

374

"""Log different messages at INFO and DEBUG levels"""

375

```