Tessl Tile for pypi/tiledbsoma@1.17.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

configuration.md core-data-structures.md data-io.md index.md query-indexing.md single-cell-biology.md spatial-data.md

index.mddocs/

0
# TileDB-SOMA
1

2
A Python implementation of the SOMA (Stack of Matrices, Annotated) API using TileDB Embedded for efficient storage and retrieval of single-cell data. TileDB-SOMA provides scalable data structures for storing and querying larger-than-memory datasets in both cloud and local systems, with specialized support for single-cell biology workflows.
3

4
## Package Information
5

6
- **Package Name**: tiledbsoma
7
- **Language**: Python
8
- **Installation**: `pip install tiledbsoma`
9
- **Version**: 1.17.1
10

11
## Core Imports
12

13
```python
14
import tiledbsoma
15
```
16

17
Common patterns for data structures:
18

19
```python
20
from tiledbsoma import (
21
    Collection, DataFrame, SparseNDArray, DenseNDArray,
22
    Experiment, Measurement, open
23
)
24
```
25

26
For I/O operations:
27

28
```python
29
import tiledbsoma.io as soma_io
30
```
31

32
## Basic Usage
33

34
```python
35
import tiledbsoma
36
import numpy as np
37
import pyarrow as pa
38

39
# Create a DataFrame with single-cell observations
40
schema = pa.schema([
41
    ("soma_joinid", pa.int64()),
42
    ("cell_type", pa.string()),
43
    ("tissue", pa.string()),
44
    ("donor_id", pa.string())
45
])
46

47
# Create and write data
48
with tiledbsoma.DataFrame.create("obs.soma", schema=schema) as obs_df:
49
    data = pa.table({
50
        "soma_joinid": [0, 1, 2, 3],
51
        "cell_type": ["T-cell", "B-cell", "Neuron", "Astrocyte"],
52
        "tissue": ["blood", "blood", "brain", "brain"],
53
        "donor_id": ["D1", "D1", "D2", "D2"]
54
    })
55
    obs_df.write(data)
56

57
# Read data back
58
with tiledbsoma.open("obs.soma") as obs_df:
59
    data = obs_df.read().concat()
60
    print(data.to_pandas())
61

62
# Create a sparse matrix for gene expression data
63
with tiledbsoma.SparseNDArray.create(
64
    "X.soma", 
65
    type=pa.float32(), 
66
    shape=(1000, 2000)  # 1000 cells, 2000 genes
67
) as X_array:
68
    # Write sparse data (cell_id, gene_id, expression_value)
69
    coordinates = pa.table({
70
        "soma_dim_0": [0, 0, 1, 1, 2],  # cell indices
71
        "soma_dim_1": [5, 100, 5, 200, 300],  # gene indices
72
    })
73
    values = pa.table({
74
        "soma_data": [1.5, 2.3, 0.8, 3.1, 1.2]  # expression values
75
    })
76
    X_array.write((coordinates, values))
77
```
78

79
## Architecture
80

81
TileDB-SOMA follows a hierarchical object model designed for single-cell data analysis:
82

83
- **Collections**: String-keyed containers that can hold any SOMA object type
84
- **Arrays**: Multi-dimensional arrays (sparse/dense) for numerical data with TileDB storage
85
- **DataFrames**: Tabular data with Arrow schemas, requiring `soma_joinid` column
86
- **Experiments**: Specialized collections representing annotated measurement matrices
87
- **Measurements**: Collections grouping observations with measurements on annotated variables
88

89
The library uses Apache Arrow for in-memory data representation and TileDB for persistent storage, enabling efficient operations on larger-than-memory datasets with support for cloud storage backends.
90

91
## Capabilities
92

93
### Core Data Structures
94

95
Fundamental SOMA data types including Collections for hierarchical organization, DataFrames for tabular data, and sparse/dense N-dimensional arrays for numerical data storage.
96

97
```python { .api }
98
class Collection:
99
    @classmethod
100
    def create(cls, uri, *, platform_config=None, context=None, tiledb_timestamp=None): ...
101
    def add_new_collection(self, key, **kwargs): ...
102
    def add_new_dataframe(self, key, **kwargs): ...
103

104
class DataFrame:
105
    @classmethod
106
    def create(cls, uri, *, schema, domain=None, platform_config=None, context=None, tiledb_timestamp=None): ...
107
    def read(self, coords=(), value_filter=None, column_names=None, result_order=None, batch_size=None, partitions=None, platform_config=None): ...
108
    def write(self, values, platform_config=None): ...
109

110
class SparseNDArray:
111
    @classmethod
112
    def create(cls, uri, *, type, shape, platform_config=None, context=None, tiledb_timestamp=None): ...
113
    def read(self, coords=(), result_order=None, batch_size=None, partitions=None, platform_config=None): ...
114
    def write(self, values, platform_config=None): ...
115

116
class DenseNDArray:
117
    @classmethod
118
    def create(cls, uri, *, type, shape, platform_config=None, context=None, tiledb_timestamp=None): ...
119
    def read(self, coords=(), result_order=None, batch_size=None, partitions=None, platform_config=None): ...
120
    def write(self, coords, values, platform_config=None): ...
121
```
122

123
[Core Data Structures](./core-data-structures.md)
124

125
### Single-Cell Biology Support
126

127
Specialized data structures for single-cell analysis including Experiments for annotated measurement matrices and Measurements for grouping observations with variables.
128

129
```python { .api }
130
class Experiment(Collection):
131
    obs: DataFrame  # Primary observations annotations
132
    ms: Collection  # Named measurements collection
133
    spatial: Collection  # Spatial scenes collection
134
    def axis_query(self, measurement_name, *, obs_query=None, var_query=None): ...
135

136
class Measurement(Collection):
137
    var: DataFrame  # Variable annotations
138
    X: Collection[SparseNDArray]  # Feature values matrices
139
    obsm: Collection[DenseNDArray]  # Dense observation annotations
140
    obsp: Collection[SparseNDArray]  # Sparse pairwise observation annotations
141
```
142

143
[Single-Cell Biology](./single-cell-biology.md)
144

145
### Spatial Data Support
146

147
Experimental spatial data structures for storing and analyzing spatial single-cell data, including geometry dataframes, point clouds, multiscale images, and spatial scenes.
148

149
```python { .api }
150
class GeometryDataFrame(DataFrame):
151
    @classmethod
152
    def create(cls, uri, *, schema, coordinate_space=("x", "y"), domain=None, platform_config=None, context=None, tiledb_timestamp=None): ...
153

154
class PointCloudDataFrame(DataFrame):
155
    @classmethod
156
    def create(cls, uri, *, schema, coordinate_space=("x", "y"), domain=None, platform_config=None, context=None, tiledb_timestamp=None): ...
157

158
class Scene(Collection):
159
    img: Collection  # Image collection
160
    obsl: Collection  # Observation location collection
161
    varl: Collection  # Variable location collection
162
```
163

164
[Spatial Data](./spatial-data.md)
165

166
### Data I/O Operations
167

168
Comprehensive ingestion and outgestion functions for converting between SOMA format and popular single-cell data formats like AnnData and H5AD files.
169

170
```python { .api }
171
def from_anndata(anndata, uri, *, measurement_name="RNA", obs_id_name="obs_id", var_id_name="var_id", X_layer_name=None, obsm_layers=None, varm_layers=None, obsp_layers=None, varp_layers=None, uns_keys=None, ingest_mode="write", registration_mapping=None, context=None, platform_config=None, additional_metadata=None): ...
172

173
def to_anndata(experiment, *, measurement_name="RNA", X_layer_name=None, obsm_layers=None, varm_layers=None, obsp_layers=None, varp_layers=None, obs_coords=None, var_coords=None, obs_value_filter=None, var_value_filter=None, obs_column_names=None, var_column_names=None, batch_size=None, context=None): ...
174

175
def from_h5ad(h5ad_file_path, output_path, *, measurement_name="RNA", ...): ...
176
```
177

178
[Data I/O](./data-io.md)
179

180
### Registration System
181

182
ID mapping utilities for multi-file append-mode ingestion, supporting soma_joinid remapping and string-to-integer label mapping across multiple input files.
183

184
```python { .api }
185
class AxisAmbientLabelMapping:
186
    def __init__(self, *, field_name: str, joinid_map: pd.DataFrame, enum_values: dict):
187
        """
188
        Tracks mapping of input data ID-column names to SOMA join IDs.
189
        
190
        Parameters:
191
        - field_name: str, name of the ID column
192
        - joinid_map: pd.DataFrame, mapping from ID to soma_joinid
193
        - enum_values: dict, categorical type mappings
194
        """
195

196
class ExperimentAmbientLabelMapping:
197
    obs: AxisAmbientLabelMapping  # Observation ID mappings
198
    var: dict[str, AxisAmbientLabelMapping]  # Variable ID mappings per measurement
199

200
class AxisIDMapping:
201
    def __init__(self, id_map: dict[int, int]):
202
        """
203
        Offset-to-joinid mappings for individual input files.
204
        
205
        Parameters:
206
        - id_map: dict, mapping from input offsets to SOMA join IDs
207
        """
208

209
class ExperimentIDMapping:
210
    obs: AxisIDMapping  # Observation ID mapping
211
    var: dict[str, AxisIDMapping]  # Variable ID mappings per measurement
212

213
def get_dataframe_values(df: DataFrame, *, ids: npt.NDArray[np.int64], col_name: str):
214
    """Get values from DataFrame for specified IDs and column"""
215
```
216

217
### Query and Indexing
218

219
Query builders and indexing utilities for efficient data retrieval from SOMA objects, including experiment axis queries and integer indexing.
220

221
```python { .api }
222
class ExperimentAxisQuery:
223
    def obs(self, *, column_names=None, batch_size=None, partitions=None, platform_config=None): ...
224
    def var(self, *, column_names=None, batch_size=None, partitions=None, platform_config=None): ...
225
    def X(self, layer_name, *, batch_size=None, partitions=None, platform_config=None): ...
226
    def to_anndata(self, *, X_layer_name=None, column_names=None, obsm_layers=None, varm_layers=None, obsp_layers=None, varp_layers=None): ...
227

228
class IntIndexer:
229
    def __init__(self, data, *, context=None): ...
230
    def get_indexer(self, target): ...
231
```
232

233
[Query and Indexing](./query-indexing.md)
234

235
### Query Filtering
236

237
Advanced query condition system for attribute filtering with support for complex Boolean expressions and membership operations.
238

239
```python { .api }
240
class QueryCondition:
241
    def __init__(self, expression: str):
242
        """
243
        Create a query condition for filtering SOMA objects.
244
        
245
        Parameters:
246
        - expression: str, Boolean expression using TileDB query syntax
247
        
248
        Supports:
249
        - Comparison operators: <, >, <=, >=, ==, !=
250
        - Boolean operators: and, or, &, |
251
        - Membership operator: in
252
        - Attribute casting: attr("column_name")
253
        - Value casting: val(value)
254
        """
255
    
256
    def init_query_condition(self, schema, query_attrs):
257
        """Initialize the query condition with schema and attributes"""
258
```
259

260
### Configuration and Options
261

262
Configuration classes for TileDB context management and platform-specific options for creating and writing SOMA objects.
263

264
```python { .api }
265
class SOMATileDBContext:
266
    def __init__(self, config=None): ...
267

268
class TileDBCreateOptions:
269
    def __init__(self, **kwargs): ...
270

271
class TileDBWriteOptions:
272
    def __init__(self, **kwargs): ...
273
```
274

275
[Configuration](./configuration.md)
276

277
## Coordinate System Types
278

279
```python { .api }
280
class CoordinateSpace:
281
    """Defines coordinate space for spatial data"""
282

283
class AffineTransform:
284
    """Affine coordinate transformation"""
285

286
class IdentityTransform:
287
    """Identity coordinate transformation"""
288

289
class ScaleTransform:
290
    """Scale coordinate transformation"""
291

292
class UniformScaleTransform:
293
    """Uniform scale coordinate transformation"""
294
```
295

296
## Core Constants
297

298
```python { .api }
299
SOMA_JOINID: str = "soma_joinid"  # Required DataFrame column name
300
```
301

302
## Exception Types
303

304
```python { .api }
305
class SOMAError(Exception):
306
    """Base exception class for all SOMA-specific errors"""
307

308
class DoesNotExistError(SOMAError):
309
    """Raised when requested SOMA object does not exist"""
310

311
class AlreadyExistsError(SOMAError):
312
    """Raised when attempting to create object that already exists"""
313

314
class NotCreateableError(SOMAError):
315
    """Raised when object cannot be created"""
316
```
317

318
## Utility Functions
319

320
```python { .api }
321
def open(uri, mode="r", *, soma_type=None, context=None, tiledb_timestamp=None):
322
    """Opens any SOMA object at URI"""
323

324
def get_implementation() -> str:
325
    """Returns implementation name ('python-tiledb')"""
326

327
def get_implementation_version() -> str:
328
    """Returns package version"""
329

330
def show_package_versions() -> None:
331
    """Prints version information for all dependencies"""
332
```
333

334
## Statistics and Logging
335

336
```python { .api }
337
def tiledbsoma_stats_json() -> str:
338
    """Return TileDB-SOMA statistics as JSON string"""
339

340
def tiledbsoma_stats_as_py() -> list:
341
    """Return TileDB-SOMA statistics as Python objects"""
342

343
def tiledbsoma_stats_enable() -> None:
344
    """Enable TileDB statistics collection"""
345

346
def tiledbsoma_stats_disable() -> None:
347
    """Disable TileDB statistics collection"""
348

349
def tiledbsoma_stats_reset() -> None:
350
    """Reset TileDB statistics"""
351

352
def tiledbsoma_stats_dump() -> None:
353
    """Dump TileDB statistics to stdout"""
354
```
355

356
## Logging Configuration
357

358
```python { .api }
359
import tiledbsoma.logging
360

361
def warning() -> None:
362
    """Set logging level to WARNING"""
363

364
def info() -> None:
365
    """Set logging level to INFO with progress indicators"""
366

367
def debug() -> None:
368
    """Set logging level to DEBUG with detailed progress"""
369

370
def log_io_same(message: str) -> None:
371
    """Log message to both INFO and DEBUG levels"""
372

373
def log_io(info_message: str | None, debug_message: str) -> None:
374
    """Log different messages at INFO and DEBUG levels"""
375
```

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/