Tessl Tile for pypi/tiledbsoma@1.17.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

configuration.md core-data-structures.md data-io.md index.md query-indexing.md single-cell-biology.md spatial-data.md

data-io.mddocs/

0
# Data I/O Operations
1

2
Comprehensive ingestion and outgestion functions for converting between SOMA format and popular single-cell data formats like AnnData and H5AD files. These functions enable seamless integration with existing single-cell analysis workflows and tools.
3

4
## Package Import
5

6
```python
7
import tiledbsoma.io as soma_io
8
```
9

10
## Capabilities
11

12
### AnnData Integration
13

14
Functions for converting between SOMA Experiments and AnnData objects, the standard format for single-cell data in Python.
15

16
#### from_anndata
17

18
Convert an AnnData object to a SOMA Experiment with full support for all AnnData components.
19

20
```python { .api }
21
def from_anndata(anndata, uri, *, measurement_name="RNA", obs_id_name="obs_id", var_id_name="var_id", X_layer_name=None, obsm_layers=None, varm_layers=None, obsp_layers=None, varp_layers=None, uns_keys=None, ingest_mode="write", registration_mapping=None, context=None, platform_config=None, additional_metadata=None):
22
    """
23
    Create a SOMA Experiment from an AnnData object.
24
    
25
    Parameters:
26
    - anndata: AnnData object to convert
27
    - uri: str, URI where the SOMA experiment will be created
28
    - measurement_name: str, name for the measurement (default: "RNA")
29
    - obs_id_name: str, column name for observation IDs (default: "obs_id")
30
    - var_id_name: str, column name for variable IDs (default: "var_id")
31
    - X_layer_name: str, name for the main X matrix layer (None uses default)
32
    - obsm_layers: list of str, obsm keys to include (None includes all)
33
    - varm_layers: list of str, varm keys to include (None includes all)
34
    - obsp_layers: list of str, obsp keys to include (None includes all)
35
    - varp_layers: list of str, varp keys to include (None includes all)
36
    - uns_keys: list of str, uns keys to include as metadata (None includes all)
37
    - ingest_mode: str, ingestion mode ("write" or "resume")
38
    - registration_mapping: dict, mapping for registration information
39
    - context: TileDB context for the operation
40
    - platform_config: TileDB-specific configuration options
41
    - additional_metadata: dict, additional metadata to store
42
    
43
    Returns:
44
    SOMA Experiment object
45
    """
46
```
47

48
#### to_anndata
49

50
Convert a SOMA Experiment back to an AnnData object with flexible layer selection.
51

52
```python { .api }
53
def to_anndata(experiment, *, measurement_name="RNA", X_layer_name=None, obsm_layers=None, varm_layers=None, obsp_layers=None, varp_layers=None, obs_coords=None, var_coords=None, obs_value_filter=None, var_value_filter=None, obs_column_names=None, var_column_names=None, batch_size=None, context=None):
54
    """
55
    Convert a SOMA Experiment to an AnnData object.
56
    
57
    Parameters:
58
    - experiment: SOMA Experiment object or ExperimentAxisQuery
59
    - measurement_name: str, name of measurement to convert (default: "RNA")
60
    - X_layer_name: str, X layer to use as main matrix (None uses first available)
61
    - obsm_layers: list of str, obsm layers to include (None includes all)
62
    - varm_layers: list of str, varm layers to include (None includes all)
63
    - obsp_layers: list of str, obsp layers to include (None includes all)
64
    - varp_layers: list of str, varp layers to include (None includes all)
65
    - obs_coords: coordinates for observation selection
66
    - var_coords: coordinates for variable selection
67
    - obs_value_filter: str, filter expression for observations
68
    - var_value_filter: str, filter expression for variables
69
    - obs_column_names: list of str, observation columns to include
70
    - var_column_names: list of str, variable columns to include
71
    - batch_size: int, batch size for reading data
72
    - context: TileDB context for the operation
73
    
74
    Returns:
75
    AnnData object
76
    """
77
```
78

79
#### Usage Example
80

81
```python
82
import scanpy as sc
83
import tiledbsoma.io as soma_io
84

85
# Load example dataset
86
adata = sc.datasets.pbmc3k()
87
adata.var_names_unique()
88

89
# Convert to SOMA format
90
experiment_uri = "pbmc3k_experiment.soma"
91
soma_io.from_anndata(
92
    adata,
93
    experiment_uri,
94
    measurement_name="RNA",
95
    obs_id_name="obs_id",
96
    var_id_name="var_id"
97
)
98

99
# Work with SOMA format - query specific data
100
with tiledbsoma.open(experiment_uri) as exp:
101
    # Query T cells only
102
    query = exp.axis_query(
103
        "RNA",
104
        obs_query=tiledbsoma.AxisQuery(value_filter="cell_type == 'T cells'")
105
    )
106
    
107
    # Convert subset back to AnnData
108
    t_cell_adata = soma_io.to_anndata(
109
        query,
110
        measurement_name="RNA",
111
        X_layer_name="X",
112
        obs_column_names=["cell_type", "n_genes", "percent_mito"]
113
    )
114
    
115
    print(f"T cells: {t_cell_adata.n_obs} cells, {t_cell_adata.n_vars} genes")
116
```
117

118
### H5AD File Operations
119

120
Functions for working directly with H5AD files, the standard file format for AnnData objects.
121

122
#### from_h5ad
123

124
Create a SOMA Experiment directly from an H5AD file without loading into memory.
125

126
```python { .api }
127
def from_h5ad(h5ad_file_path, output_path, *, measurement_name="RNA", obs_id_name="obs_id", var_id_name="var_id", X_layer_name=None, obsm_layers=None, varm_layers=None, obsp_layers=None, varp_layers=None, uns_keys=None, ingest_mode="write", registration_mapping=None, context=None, platform_config=None, additional_metadata=None):
128
    """
129
    Create a SOMA Experiment from an H5AD file.
130
    
131
    Parameters:
132
    - h5ad_file_path: str, path to input H5AD file
133
    - output_path: str, URI where SOMA experiment will be created
134
    - measurement_name: str, name for the measurement (default: "RNA")
135
    - obs_id_name: str, column name for observation IDs (default: "obs_id")
136
    - var_id_name: str, column name for variable IDs (default: "var_id")
137
    - X_layer_name: str, name for the main X matrix layer (None uses default)
138
    - obsm_layers: list of str, obsm keys to include (None includes all)
139
    - varm_layers: list of str, varm keys to include (None includes all)
140
    - obsp_layers: list of str, obsp keys to include (None includes all)
141
    - varp_layers: list of str, varp keys to include (None includes all)
142
    - uns_keys: list of str, uns keys to include as metadata (None includes all)
143
    - ingest_mode: str, ingestion mode ("write" or "resume")
144
    - registration_mapping: dict, mapping for registration information
145
    - context: TileDB context for the operation
146
    - platform_config: TileDB-specific configuration options
147
    - additional_metadata: dict, additional metadata to store
148
    
149
    Returns:
150
    SOMA Experiment object
151
    """
152
```
153

154
#### to_h5ad
155

156
Write a SOMA Experiment directly to an H5AD file.
157

158
```python { .api }
159
def to_h5ad(experiment, h5ad_path, *, measurement_name="RNA", X_layer_name=None, obsm_layers=None, varm_layers=None, obsp_layers=None, varp_layers=None, obs_coords=None, var_coords=None, obs_value_filter=None, var_value_filter=None, obs_column_names=None, var_column_names=None, batch_size=None, context=None):
160
    """
161
    Write a SOMA Experiment to an H5AD file.
162
    
163
    Parameters:
164
    - experiment: SOMA Experiment object or ExperimentAxisQuery
165
    - h5ad_path: str, output H5AD file path
166
    - measurement_name: str, name of measurement to write (default: "RNA")
167
    - X_layer_name: str, X layer to use as main matrix (None uses first available)
168
    - obsm_layers: list of str, obsm layers to include (None includes all)
169
    - varm_layers: list of str, varm layers to include (None includes all)
170
    - obsp_layers: list of str, obsp layers to include (None includes all)
171
    - varp_layers: list of str, varp layers to include (None includes all)
172
    - obs_coords: coordinates for observation selection
173
    - var_coords: coordinates for variable selection
174
    - obs_value_filter: str, filter expression for observations
175
    - var_value_filter: str, filter expression for variables
176
    - obs_column_names: list of str, observation columns to include
177
    - var_column_names: list of str, variable columns to include
178
    - batch_size: int, batch size for reading data
179
    - context: TileDB context for the operation
180
    """
181
```
182

183
#### Usage Example
184

185
```python
186
import tiledbsoma.io as soma_io
187

188
# Convert H5AD file to SOMA format
189
soma_io.from_h5ad(
190
    "input_data.h5ad",
191
    "experiment.soma",
192
    measurement_name="RNA"
193
)
194

195
# Process data in SOMA format
196
with tiledbsoma.open("experiment.soma") as exp:
197
    # Perform analysis, filtering, etc.
198
    query = exp.axis_query("RNA", 
199
        obs_query=tiledbsoma.AxisQuery(value_filter="n_genes > 500")
200
    )
201
    
202
    # Export filtered results back to H5AD
203
    soma_io.to_h5ad(
204
        query,
205
        "filtered_output.h5ad",
206
        measurement_name="RNA"
207
    )
208
```
209

210
### Batch Registration
211

212
Functions for registering multiple AnnData objects or H5AD files into a single SOMA Experiment.
213

214
#### register_anndatas
215

216
Register multiple AnnData objects into a single SOMA Experiment with consistent indexing.
217

218
```python { .api }
219
def register_anndatas(experiment_uri, adatas, *, measurement_name="RNA", obs_id_name="obs_id", var_id_name="var_id", registration_mapping=None, context=None, platform_config=None):
220
    """
221
    Register multiple AnnData objects into a SOMA Experiment.
222
    
223
    Parameters:
224
    - experiment_uri: str, URI of the SOMA experiment
225
    - adatas: list of AnnData objects to register
226
    - measurement_name: str, name for the measurement (default: "RNA")
227
    - obs_id_name: str, column name for observation IDs (default: "obs_id")
228
    - var_id_name: str, column name for variable IDs (default: "var_id")
229
    - registration_mapping: dict, mapping for registration information
230
    - context: TileDB context for the operation
231
    - platform_config: TileDB-specific configuration options
232
    
233
    Returns:
234
    SOMA Experiment object
235
    """
236
```
237

238
#### register_h5ads
239

240
Register multiple H5AD files into a single SOMA Experiment.
241

242
```python { .api }
243
def register_h5ads(experiment_uri, h5ad_file_paths, *, measurement_name="RNA", obs_id_name="obs_id", var_id_name="var_id", registration_mapping=None, context=None, platform_config=None):
244
    """
245
    Register multiple H5AD files into a SOMA Experiment.
246
    
247
    Parameters:
248
    - experiment_uri: str, URI of the SOMA experiment
249
    - h5ad_file_paths: list of str, paths to H5AD files to register
250
    - measurement_name: str, name for the measurement (default: "RNA")
251
    - obs_id_name: str, column name for observation IDs (default: "obs_id")
252
    - var_id_name: str, column name for variable IDs (default: "var_id")
253
    - registration_mapping: dict, mapping for registration information
254
    - context: TileDB context for the operation
255
    - platform_config: TileDB-specific configuration options
256
    
257
    Returns:
258
    SOMA Experiment object
259
    """
260
```
261

262
#### Usage Example
263

264
```python
265
import tiledbsoma.io as soma_io
266
import scanpy as sc
267

268
# Load multiple datasets
269
pbmc_1k = sc.datasets.pbmc68k_reduced()[:1000, :].copy()
270
pbmc_2k = sc.datasets.pbmc68k_reduced()[1000:3000, :].copy()
271

272
# Register into single experiment
273
soma_io.register_anndatas(
274
    "combined_experiment.soma",
275
    [pbmc_1k, pbmc_2k],
276
    measurement_name="RNA"
277
)
278

279
# Register H5AD files
280
h5ad_files = ["sample1.h5ad", "sample2.h5ad", "sample3.h5ad"]
281
soma_io.register_h5ads(
282
    "multi_sample_experiment.soma",
283
    h5ad_files,
284
    measurement_name="RNA"
285
)
286
```
287

288
### Data Append and Update Operations
289

290
Functions for incrementally adding or modifying data in existing SOMA objects.
291

292
#### Append Functions
293

294
```python { .api }
295
def append_obs(soma_df, values, *, context=None, platform_config=None):
296
    """
297
    Append observations to a SOMA DataFrame.
298
    
299
    Parameters:
300
    - soma_df: SOMA DataFrame to append to
301
    - values: pyarrow.Table with new observation data
302
    - context: TileDB context for the operation
303
    - platform_config: TileDB-specific configuration options
304
    """
305

306
def append_var(soma_df, values, *, context=None, platform_config=None):
307
    """
308
    Append variables to a SOMA DataFrame.
309
    
310
    Parameters:
311
    - soma_df: SOMA DataFrame to append to
312
    - values: pyarrow.Table with new variable data
313
    - context: TileDB context for the operation
314
    - platform_config: TileDB-specific configuration options
315
    """
316

317
def append_X(collection, values, *, context=None, platform_config=None):
318
    """
319
    Append expression data to an X collection.
320
    
321
    Parameters:
322
    - collection: SOMA Collection containing X matrices
323
    - values: expression data to append
324
    - context: TileDB context for the operation
325
    - platform_config: TileDB-specific configuration options
326
    """
327
```
328

329
#### Update Functions
330

331
```python { .api }
332
def update_obs(soma_df, values, *, context=None, platform_config=None):
333
    """
334
    Update observations in a SOMA DataFrame.
335
    
336
    Parameters:
337
    - soma_df: SOMA DataFrame to update
338
    - values: pyarrow.Table with updated observation data
339
    - context: TileDB context for the operation
340
    - platform_config: TileDB-specific configuration options
341
    """
342

343
def update_var(soma_df, values, *, context=None, platform_config=None):
344
    """
345
    Update variables in a SOMA DataFrame.
346
    
347
    Parameters:
348
    - soma_df: SOMA DataFrame to update
349
    - values: pyarrow.Table with updated variable data
350
    - context: TileDB context for the operation
351
    - platform_config: TileDB-specific configuration options
352
    """
353

354
def update_matrix(soma_coll, values, *, context=None, platform_config=None):
355
    """
356
    Update matrix data in a SOMA Collection.
357
    
358
    Parameters:
359
    - soma_coll: SOMA Collection containing matrices
360
    - values: matrix data to update
361
    - context: TileDB context for the operation
362
    - platform_config: TileDB-specific configuration options
363
    """
364
```
365

366
#### Matrix Management Functions
367

368
```python { .api }
369
def add_matrix_to_collection(collection, matrix, layer_name, *, context=None, platform_config=None):
370
    """
371
    Add a matrix to a SOMA Collection.
372
    
373
    Parameters:
374
    - collection: SOMA Collection to add matrix to
375
    - matrix: matrix data to add
376
    - layer_name: str, name for the new matrix layer
377
    - context: TileDB context for the operation
378
    - platform_config: TileDB-specific configuration options
379
    """
380

381
def add_X_layer(measurement, matrix, layer_name, *, context=None, platform_config=None):
382
    """
383
    Add an X layer to a Measurement.
384
    
385
    Parameters:
386
    - measurement: SOMA Measurement object
387
    - matrix: matrix data to add as X layer
388
    - layer_name: str, name for the new X layer
389
    - context: TileDB context for the operation
390
    - platform_config: TileDB-specific configuration options
391
    """
392

393
def create_from_matrix(matrix, uri, *, context=None, platform_config=None):
394
    """
395
    Create a SOMA array from a matrix.
396
    
397
    Parameters:
398
    - matrix: input matrix data
399
    - uri: str, URI where SOMA array will be created
400
    - context: TileDB context for the operation
401
    - platform_config: TileDB-specific configuration options
402
    
403
    Returns:
404
    SOMA array object
405
    """
406
```
407

408
### Experiment Shaping Operations
409

410
Functions for managing and resizing SOMA Experiment dimensions.
411

412
```python { .api }
413
def get_experiment_shapes(experiment, *, measurement_name="RNA"):
414
    """
415
    Get current shapes of experiment components.
416
    
417
    Parameters:
418
    - experiment: SOMA Experiment object
419
    - measurement_name: str, name of measurement to analyze (default: "RNA")
420
    
421
    Returns:
422
    dict: Shapes of experiment components
423
    """
424

425
def show_experiment_shapes(experiment, *, measurement_name="RNA"):
426
    """
427
    Display experiment component shapes.
428
    
429
    Parameters:
430
    - experiment: SOMA Experiment object
431
    - measurement_name: str, name of measurement to analyze (default: "RNA")
432
    """
433

434
def resize_experiment(experiment, shape, *, measurement_name="RNA"):
435
    """
436
    Resize experiment dimensions.
437
    
438
    Parameters:
439
    - experiment: SOMA Experiment object
440
    - shape: new shape specification
441
    - measurement_name: str, name of measurement to resize (default: "RNA")
442
    """
443

444
def upgrade_experiment_shapes(experiment, *, measurement_name="RNA"):
445
    """
446
    Upgrade experiment shapes to accommodate new data.
447
    
448
    Parameters:
449
    - experiment: SOMA Experiment object
450
    - measurement_name: str, name of measurement to upgrade (default: "RNA")
451
    """
452
```
453

454
### Registration Mapping
455

456
Support for mapping ambient labels during registration of multiple datasets.
457

458
```python { .api }
459
class ExperimentAmbientLabelMapping:
460
    """
461
    Mapping for experiment ambient labels during registration.
462
    
463
    Provides functionality for consistent labeling across multiple
464
    datasets when registering them into a single experiment.
465
    """
466
```
467

468
#### Usage Example
469

470
```python
471
import tiledbsoma.io as soma_io
472

473
# Incremental data loading workflow
474
with tiledbsoma.open("experiment.soma", mode="w") as exp:
475
    # Get current shapes
476
    shapes = soma_io.get_experiment_shapes(exp, measurement_name="RNA")
477
    print(f"Current shapes: {shapes}")
478
    
479
    # Add new observations
480
    new_obs_data = pa.table({
481
        "soma_joinid": range(1000, 1100),
482
        "cell_type": ["Macrophage"] * 100,
483
        "sample_id": ["Sample3"] * 100
484
    })
485
    soma_io.append_obs(exp.obs, new_obs_data)
486
    
487
    # Add corresponding expression data
488
    # ... (prepare expression matrix for new cells)
489
    
490
    # Resize experiment to accommodate new data
491
    soma_io.upgrade_experiment_shapes(exp, measurement_name="RNA")
492
```
493

494
This comprehensive I/O functionality enables seamless integration between SOMA's scalable storage format and the existing single-cell analysis ecosystem.

Version

Tile

Files

data-io.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

data-io.mddocs/