0
# Data I/O Operations
1
2
Comprehensive ingestion and outgestion functions for converting between SOMA format and popular single-cell data formats like AnnData and H5AD files. These functions enable seamless integration with existing single-cell analysis workflows and tools.
3
4
## Package Import
5
6
```python
7
import tiledbsoma.io as soma_io
8
```
9
10
## Capabilities
11
12
### AnnData Integration
13
14
Functions for converting between SOMA Experiments and AnnData objects, the standard format for single-cell data in Python.
15
16
#### from_anndata
17
18
Convert an AnnData object to a SOMA Experiment with full support for all AnnData components.
19
20
```python { .api }
21
def from_anndata(anndata, uri, *, measurement_name="RNA", obs_id_name="obs_id", var_id_name="var_id", X_layer_name=None, obsm_layers=None, varm_layers=None, obsp_layers=None, varp_layers=None, uns_keys=None, ingest_mode="write", registration_mapping=None, context=None, platform_config=None, additional_metadata=None):
22
"""
23
Create a SOMA Experiment from an AnnData object.
24
25
Parameters:
26
- anndata: AnnData object to convert
27
- uri: str, URI where the SOMA experiment will be created
28
- measurement_name: str, name for the measurement (default: "RNA")
29
- obs_id_name: str, column name for observation IDs (default: "obs_id")
30
- var_id_name: str, column name for variable IDs (default: "var_id")
31
- X_layer_name: str, name for the main X matrix layer (None uses default)
32
- obsm_layers: list of str, obsm keys to include (None includes all)
33
- varm_layers: list of str, varm keys to include (None includes all)
34
- obsp_layers: list of str, obsp keys to include (None includes all)
35
- varp_layers: list of str, varp keys to include (None includes all)
36
- uns_keys: list of str, uns keys to include as metadata (None includes all)
37
- ingest_mode: str, ingestion mode ("write" or "resume")
38
- registration_mapping: dict, mapping for registration information
39
- context: TileDB context for the operation
40
- platform_config: TileDB-specific configuration options
41
- additional_metadata: dict, additional metadata to store
42
43
Returns:
44
SOMA Experiment object
45
"""
46
```
47
48
#### to_anndata
49
50
Convert a SOMA Experiment back to an AnnData object with flexible layer selection.
51
52
```python { .api }
53
def to_anndata(experiment, *, measurement_name="RNA", X_layer_name=None, obsm_layers=None, varm_layers=None, obsp_layers=None, varp_layers=None, obs_coords=None, var_coords=None, obs_value_filter=None, var_value_filter=None, obs_column_names=None, var_column_names=None, batch_size=None, context=None):
54
"""
55
Convert a SOMA Experiment to an AnnData object.
56
57
Parameters:
58
- experiment: SOMA Experiment object or ExperimentAxisQuery
59
- measurement_name: str, name of measurement to convert (default: "RNA")
60
- X_layer_name: str, X layer to use as main matrix (None uses first available)
61
- obsm_layers: list of str, obsm layers to include (None includes all)
62
- varm_layers: list of str, varm layers to include (None includes all)
63
- obsp_layers: list of str, obsp layers to include (None includes all)
64
- varp_layers: list of str, varp layers to include (None includes all)
65
- obs_coords: coordinates for observation selection
66
- var_coords: coordinates for variable selection
67
- obs_value_filter: str, filter expression for observations
68
- var_value_filter: str, filter expression for variables
69
- obs_column_names: list of str, observation columns to include
70
- var_column_names: list of str, variable columns to include
71
- batch_size: int, batch size for reading data
72
- context: TileDB context for the operation
73
74
Returns:
75
AnnData object
76
"""
77
```
78
79
#### Usage Example
80
81
```python
82
import scanpy as sc
83
import tiledbsoma.io as soma_io
84
85
# Load example dataset
86
adata = sc.datasets.pbmc3k()
87
adata.var_names_unique()
88
89
# Convert to SOMA format
90
experiment_uri = "pbmc3k_experiment.soma"
91
soma_io.from_anndata(
92
adata,
93
experiment_uri,
94
measurement_name="RNA",
95
obs_id_name="obs_id",
96
var_id_name="var_id"
97
)
98
99
# Work with SOMA format - query specific data
100
with tiledbsoma.open(experiment_uri) as exp:
101
# Query T cells only
102
query = exp.axis_query(
103
"RNA",
104
obs_query=tiledbsoma.AxisQuery(value_filter="cell_type == 'T cells'")
105
)
106
107
# Convert subset back to AnnData
108
t_cell_adata = soma_io.to_anndata(
109
query,
110
measurement_name="RNA",
111
X_layer_name="X",
112
obs_column_names=["cell_type", "n_genes", "percent_mito"]
113
)
114
115
print(f"T cells: {t_cell_adata.n_obs} cells, {t_cell_adata.n_vars} genes")
116
```
117
118
### H5AD File Operations
119
120
Functions for working directly with H5AD files, the standard file format for AnnData objects.
121
122
#### from_h5ad
123
124
Create a SOMA Experiment directly from an H5AD file without loading into memory.
125
126
```python { .api }
127
def from_h5ad(h5ad_file_path, output_path, *, measurement_name="RNA", obs_id_name="obs_id", var_id_name="var_id", X_layer_name=None, obsm_layers=None, varm_layers=None, obsp_layers=None, varp_layers=None, uns_keys=None, ingest_mode="write", registration_mapping=None, context=None, platform_config=None, additional_metadata=None):
128
"""
129
Create a SOMA Experiment from an H5AD file.
130
131
Parameters:
132
- h5ad_file_path: str, path to input H5AD file
133
- output_path: str, URI where SOMA experiment will be created
134
- measurement_name: str, name for the measurement (default: "RNA")
135
- obs_id_name: str, column name for observation IDs (default: "obs_id")
136
- var_id_name: str, column name for variable IDs (default: "var_id")
137
- X_layer_name: str, name for the main X matrix layer (None uses default)
138
- obsm_layers: list of str, obsm keys to include (None includes all)
139
- varm_layers: list of str, varm keys to include (None includes all)
140
- obsp_layers: list of str, obsp keys to include (None includes all)
141
- varp_layers: list of str, varp keys to include (None includes all)
142
- uns_keys: list of str, uns keys to include as metadata (None includes all)
143
- ingest_mode: str, ingestion mode ("write" or "resume")
144
- registration_mapping: dict, mapping for registration information
145
- context: TileDB context for the operation
146
- platform_config: TileDB-specific configuration options
147
- additional_metadata: dict, additional metadata to store
148
149
Returns:
150
SOMA Experiment object
151
"""
152
```
153
154
#### to_h5ad
155
156
Write a SOMA Experiment directly to an H5AD file.
157
158
```python { .api }
159
def to_h5ad(experiment, h5ad_path, *, measurement_name="RNA", X_layer_name=None, obsm_layers=None, varm_layers=None, obsp_layers=None, varp_layers=None, obs_coords=None, var_coords=None, obs_value_filter=None, var_value_filter=None, obs_column_names=None, var_column_names=None, batch_size=None, context=None):
160
"""
161
Write a SOMA Experiment to an H5AD file.
162
163
Parameters:
164
- experiment: SOMA Experiment object or ExperimentAxisQuery
165
- h5ad_path: str, output H5AD file path
166
- measurement_name: str, name of measurement to write (default: "RNA")
167
- X_layer_name: str, X layer to use as main matrix (None uses first available)
168
- obsm_layers: list of str, obsm layers to include (None includes all)
169
- varm_layers: list of str, varm layers to include (None includes all)
170
- obsp_layers: list of str, obsp layers to include (None includes all)
171
- varp_layers: list of str, varp layers to include (None includes all)
172
- obs_coords: coordinates for observation selection
173
- var_coords: coordinates for variable selection
174
- obs_value_filter: str, filter expression for observations
175
- var_value_filter: str, filter expression for variables
176
- obs_column_names: list of str, observation columns to include
177
- var_column_names: list of str, variable columns to include
178
- batch_size: int, batch size for reading data
179
- context: TileDB context for the operation
180
"""
181
```
182
183
#### Usage Example
184
185
```python
186
import tiledbsoma.io as soma_io
187
188
# Convert H5AD file to SOMA format
189
soma_io.from_h5ad(
190
"input_data.h5ad",
191
"experiment.soma",
192
measurement_name="RNA"
193
)
194
195
# Process data in SOMA format
196
with tiledbsoma.open("experiment.soma") as exp:
197
# Perform analysis, filtering, etc.
198
query = exp.axis_query("RNA",
199
obs_query=tiledbsoma.AxisQuery(value_filter="n_genes > 500")
200
)
201
202
# Export filtered results back to H5AD
203
soma_io.to_h5ad(
204
query,
205
"filtered_output.h5ad",
206
measurement_name="RNA"
207
)
208
```
209
210
### Batch Registration
211
212
Functions for registering multiple AnnData objects or H5AD files into a single SOMA Experiment.
213
214
#### register_anndatas
215
216
Register multiple AnnData objects into a single SOMA Experiment with consistent indexing.
217
218
```python { .api }
219
def register_anndatas(experiment_uri, adatas, *, measurement_name="RNA", obs_id_name="obs_id", var_id_name="var_id", registration_mapping=None, context=None, platform_config=None):
220
"""
221
Register multiple AnnData objects into a SOMA Experiment.
222
223
Parameters:
224
- experiment_uri: str, URI of the SOMA experiment
225
- adatas: list of AnnData objects to register
226
- measurement_name: str, name for the measurement (default: "RNA")
227
- obs_id_name: str, column name for observation IDs (default: "obs_id")
228
- var_id_name: str, column name for variable IDs (default: "var_id")
229
- registration_mapping: dict, mapping for registration information
230
- context: TileDB context for the operation
231
- platform_config: TileDB-specific configuration options
232
233
Returns:
234
SOMA Experiment object
235
"""
236
```
237
238
#### register_h5ads
239
240
Register multiple H5AD files into a single SOMA Experiment.
241
242
```python { .api }
243
def register_h5ads(experiment_uri, h5ad_file_paths, *, measurement_name="RNA", obs_id_name="obs_id", var_id_name="var_id", registration_mapping=None, context=None, platform_config=None):
244
"""
245
Register multiple H5AD files into a SOMA Experiment.
246
247
Parameters:
248
- experiment_uri: str, URI of the SOMA experiment
249
- h5ad_file_paths: list of str, paths to H5AD files to register
250
- measurement_name: str, name for the measurement (default: "RNA")
251
- obs_id_name: str, column name for observation IDs (default: "obs_id")
252
- var_id_name: str, column name for variable IDs (default: "var_id")
253
- registration_mapping: dict, mapping for registration information
254
- context: TileDB context for the operation
255
- platform_config: TileDB-specific configuration options
256
257
Returns:
258
SOMA Experiment object
259
"""
260
```
261
262
#### Usage Example
263
264
```python
265
import tiledbsoma.io as soma_io
266
import scanpy as sc
267
268
# Load multiple datasets
269
pbmc_1k = sc.datasets.pbmc68k_reduced()[:1000, :].copy()
270
pbmc_2k = sc.datasets.pbmc68k_reduced()[1000:3000, :].copy()
271
272
# Register into single experiment
273
soma_io.register_anndatas(
274
"combined_experiment.soma",
275
[pbmc_1k, pbmc_2k],
276
measurement_name="RNA"
277
)
278
279
# Register H5AD files
280
h5ad_files = ["sample1.h5ad", "sample2.h5ad", "sample3.h5ad"]
281
soma_io.register_h5ads(
282
"multi_sample_experiment.soma",
283
h5ad_files,
284
measurement_name="RNA"
285
)
286
```
287
288
### Data Append and Update Operations
289
290
Functions for incrementally adding or modifying data in existing SOMA objects.
291
292
#### Append Functions
293
294
```python { .api }
295
def append_obs(soma_df, values, *, context=None, platform_config=None):
296
"""
297
Append observations to a SOMA DataFrame.
298
299
Parameters:
300
- soma_df: SOMA DataFrame to append to
301
- values: pyarrow.Table with new observation data
302
- context: TileDB context for the operation
303
- platform_config: TileDB-specific configuration options
304
"""
305
306
def append_var(soma_df, values, *, context=None, platform_config=None):
307
"""
308
Append variables to a SOMA DataFrame.
309
310
Parameters:
311
- soma_df: SOMA DataFrame to append to
312
- values: pyarrow.Table with new variable data
313
- context: TileDB context for the operation
314
- platform_config: TileDB-specific configuration options
315
"""
316
317
def append_X(collection, values, *, context=None, platform_config=None):
318
"""
319
Append expression data to an X collection.
320
321
Parameters:
322
- collection: SOMA Collection containing X matrices
323
- values: expression data to append
324
- context: TileDB context for the operation
325
- platform_config: TileDB-specific configuration options
326
"""
327
```
328
329
#### Update Functions
330
331
```python { .api }
332
def update_obs(soma_df, values, *, context=None, platform_config=None):
333
"""
334
Update observations in a SOMA DataFrame.
335
336
Parameters:
337
- soma_df: SOMA DataFrame to update
338
- values: pyarrow.Table with updated observation data
339
- context: TileDB context for the operation
340
- platform_config: TileDB-specific configuration options
341
"""
342
343
def update_var(soma_df, values, *, context=None, platform_config=None):
344
"""
345
Update variables in a SOMA DataFrame.
346
347
Parameters:
348
- soma_df: SOMA DataFrame to update
349
- values: pyarrow.Table with updated variable data
350
- context: TileDB context for the operation
351
- platform_config: TileDB-specific configuration options
352
"""
353
354
def update_matrix(soma_coll, values, *, context=None, platform_config=None):
355
"""
356
Update matrix data in a SOMA Collection.
357
358
Parameters:
359
- soma_coll: SOMA Collection containing matrices
360
- values: matrix data to update
361
- context: TileDB context for the operation
362
- platform_config: TileDB-specific configuration options
363
"""
364
```
365
366
#### Matrix Management Functions
367
368
```python { .api }
369
def add_matrix_to_collection(collection, matrix, layer_name, *, context=None, platform_config=None):
370
"""
371
Add a matrix to a SOMA Collection.
372
373
Parameters:
374
- collection: SOMA Collection to add matrix to
375
- matrix: matrix data to add
376
- layer_name: str, name for the new matrix layer
377
- context: TileDB context for the operation
378
- platform_config: TileDB-specific configuration options
379
"""
380
381
def add_X_layer(measurement, matrix, layer_name, *, context=None, platform_config=None):
382
"""
383
Add an X layer to a Measurement.
384
385
Parameters:
386
- measurement: SOMA Measurement object
387
- matrix: matrix data to add as X layer
388
- layer_name: str, name for the new X layer
389
- context: TileDB context for the operation
390
- platform_config: TileDB-specific configuration options
391
"""
392
393
def create_from_matrix(matrix, uri, *, context=None, platform_config=None):
394
"""
395
Create a SOMA array from a matrix.
396
397
Parameters:
398
- matrix: input matrix data
399
- uri: str, URI where SOMA array will be created
400
- context: TileDB context for the operation
401
- platform_config: TileDB-specific configuration options
402
403
Returns:
404
SOMA array object
405
"""
406
```
407
408
### Experiment Shaping Operations
409
410
Functions for managing and resizing SOMA Experiment dimensions.
411
412
```python { .api }
413
def get_experiment_shapes(experiment, *, measurement_name="RNA"):
414
"""
415
Get current shapes of experiment components.
416
417
Parameters:
418
- experiment: SOMA Experiment object
419
- measurement_name: str, name of measurement to analyze (default: "RNA")
420
421
Returns:
422
dict: Shapes of experiment components
423
"""
424
425
def show_experiment_shapes(experiment, *, measurement_name="RNA"):
426
"""
427
Display experiment component shapes.
428
429
Parameters:
430
- experiment: SOMA Experiment object
431
- measurement_name: str, name of measurement to analyze (default: "RNA")
432
"""
433
434
def resize_experiment(experiment, shape, *, measurement_name="RNA"):
435
"""
436
Resize experiment dimensions.
437
438
Parameters:
439
- experiment: SOMA Experiment object
440
- shape: new shape specification
441
- measurement_name: str, name of measurement to resize (default: "RNA")
442
"""
443
444
def upgrade_experiment_shapes(experiment, *, measurement_name="RNA"):
445
"""
446
Upgrade experiment shapes to accommodate new data.
447
448
Parameters:
449
- experiment: SOMA Experiment object
450
- measurement_name: str, name of measurement to upgrade (default: "RNA")
451
"""
452
```
453
454
### Registration Mapping
455
456
Support for mapping ambient labels during registration of multiple datasets.
457
458
```python { .api }
459
class ExperimentAmbientLabelMapping:
460
"""
461
Mapping for experiment ambient labels during registration.
462
463
Provides functionality for consistent labeling across multiple
464
datasets when registering them into a single experiment.
465
"""
466
```
467
468
#### Usage Example
469
470
```python
471
import tiledbsoma.io as soma_io
472
473
# Incremental data loading workflow
474
with tiledbsoma.open("experiment.soma", mode="w") as exp:
475
# Get current shapes
476
shapes = soma_io.get_experiment_shapes(exp, measurement_name="RNA")
477
print(f"Current shapes: {shapes}")
478
479
# Add new observations
480
new_obs_data = pa.table({
481
"soma_joinid": range(1000, 1100),
482
"cell_type": ["Macrophage"] * 100,
483
"sample_id": ["Sample3"] * 100
484
})
485
soma_io.append_obs(exp.obs, new_obs_data)
486
487
# Add corresponding expression data
488
# ... (prepare expression matrix for new cells)
489
490
# Resize experiment to accommodate new data
491
soma_io.upgrade_experiment_shapes(exp, measurement_name="RNA")
492
```
493
494
This comprehensive I/O functionality enables seamless integration between SOMA's scalable storage format and the existing single-cell analysis ecosystem.