or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

configuration.mdcore-data-structures.mddata-io.mdindex.mdquery-indexing.mdsingle-cell-biology.mdspatial-data.md

data-io.mddocs/

0

# Data I/O Operations

1

2

Comprehensive ingestion and outgestion functions for converting between SOMA format and popular single-cell data formats like AnnData and H5AD files. These functions enable seamless integration with existing single-cell analysis workflows and tools.

3

4

## Package Import

5

6

```python

7

import tiledbsoma.io as soma_io

8

```

9

10

## Capabilities

11

12

### AnnData Integration

13

14

Functions for converting between SOMA Experiments and AnnData objects, the standard format for single-cell data in Python.

15

16

#### from_anndata

17

18

Convert an AnnData object to a SOMA Experiment with full support for all AnnData components.

19

20

```python { .api }

21

def from_anndata(anndata, uri, *, measurement_name="RNA", obs_id_name="obs_id", var_id_name="var_id", X_layer_name=None, obsm_layers=None, varm_layers=None, obsp_layers=None, varp_layers=None, uns_keys=None, ingest_mode="write", registration_mapping=None, context=None, platform_config=None, additional_metadata=None):

22

"""

23

Create a SOMA Experiment from an AnnData object.

24

25

Parameters:

26

- anndata: AnnData object to convert

27

- uri: str, URI where the SOMA experiment will be created

28

- measurement_name: str, name for the measurement (default: "RNA")

29

- obs_id_name: str, column name for observation IDs (default: "obs_id")

30

- var_id_name: str, column name for variable IDs (default: "var_id")

31

- X_layer_name: str, name for the main X matrix layer (None uses default)

32

- obsm_layers: list of str, obsm keys to include (None includes all)

33

- varm_layers: list of str, varm keys to include (None includes all)

34

- obsp_layers: list of str, obsp keys to include (None includes all)

35

- varp_layers: list of str, varp keys to include (None includes all)

36

- uns_keys: list of str, uns keys to include as metadata (None includes all)

37

- ingest_mode: str, ingestion mode ("write" or "resume")

38

- registration_mapping: dict, mapping for registration information

39

- context: TileDB context for the operation

40

- platform_config: TileDB-specific configuration options

41

- additional_metadata: dict, additional metadata to store

42

43

Returns:

44

SOMA Experiment object

45

"""

46

```

47

48

#### to_anndata

49

50

Convert a SOMA Experiment back to an AnnData object with flexible layer selection.

51

52

```python { .api }

53

def to_anndata(experiment, *, measurement_name="RNA", X_layer_name=None, obsm_layers=None, varm_layers=None, obsp_layers=None, varp_layers=None, obs_coords=None, var_coords=None, obs_value_filter=None, var_value_filter=None, obs_column_names=None, var_column_names=None, batch_size=None, context=None):

54

"""

55

Convert a SOMA Experiment to an AnnData object.

56

57

Parameters:

58

- experiment: SOMA Experiment object or ExperimentAxisQuery

59

- measurement_name: str, name of measurement to convert (default: "RNA")

60

- X_layer_name: str, X layer to use as main matrix (None uses first available)

61

- obsm_layers: list of str, obsm layers to include (None includes all)

62

- varm_layers: list of str, varm layers to include (None includes all)

63

- obsp_layers: list of str, obsp layers to include (None includes all)

64

- varp_layers: list of str, varp layers to include (None includes all)

65

- obs_coords: coordinates for observation selection

66

- var_coords: coordinates for variable selection

67

- obs_value_filter: str, filter expression for observations

68

- var_value_filter: str, filter expression for variables

69

- obs_column_names: list of str, observation columns to include

70

- var_column_names: list of str, variable columns to include

71

- batch_size: int, batch size for reading data

72

- context: TileDB context for the operation

73

74

Returns:

75

AnnData object

76

"""

77

```

78

79

#### Usage Example

80

81

```python

82

import scanpy as sc

83

import tiledbsoma.io as soma_io

84

85

# Load example dataset

86

adata = sc.datasets.pbmc3k()

87

adata.var_names_unique()

88

89

# Convert to SOMA format

90

experiment_uri = "pbmc3k_experiment.soma"

91

soma_io.from_anndata(

92

adata,

93

experiment_uri,

94

measurement_name="RNA",

95

obs_id_name="obs_id",

96

var_id_name="var_id"

97

)

98

99

# Work with SOMA format - query specific data

100

with tiledbsoma.open(experiment_uri) as exp:

101

# Query T cells only

102

query = exp.axis_query(

103

"RNA",

104

obs_query=tiledbsoma.AxisQuery(value_filter="cell_type == 'T cells'")

105

)

106

107

# Convert subset back to AnnData

108

t_cell_adata = soma_io.to_anndata(

109

query,

110

measurement_name="RNA",

111

X_layer_name="X",

112

obs_column_names=["cell_type", "n_genes", "percent_mito"]

113

)

114

115

print(f"T cells: {t_cell_adata.n_obs} cells, {t_cell_adata.n_vars} genes")

116

```

117

118

### H5AD File Operations

119

120

Functions for working directly with H5AD files, the standard file format for AnnData objects.

121

122

#### from_h5ad

123

124

Create a SOMA Experiment directly from an H5AD file without loading into memory.

125

126

```python { .api }

127

def from_h5ad(h5ad_file_path, output_path, *, measurement_name="RNA", obs_id_name="obs_id", var_id_name="var_id", X_layer_name=None, obsm_layers=None, varm_layers=None, obsp_layers=None, varp_layers=None, uns_keys=None, ingest_mode="write", registration_mapping=None, context=None, platform_config=None, additional_metadata=None):

128

"""

129

Create a SOMA Experiment from an H5AD file.

130

131

Parameters:

132

- h5ad_file_path: str, path to input H5AD file

133

- output_path: str, URI where SOMA experiment will be created

134

- measurement_name: str, name for the measurement (default: "RNA")

135

- obs_id_name: str, column name for observation IDs (default: "obs_id")

136

- var_id_name: str, column name for variable IDs (default: "var_id")

137

- X_layer_name: str, name for the main X matrix layer (None uses default)

138

- obsm_layers: list of str, obsm keys to include (None includes all)

139

- varm_layers: list of str, varm keys to include (None includes all)

140

- obsp_layers: list of str, obsp keys to include (None includes all)

141

- varp_layers: list of str, varp keys to include (None includes all)

142

- uns_keys: list of str, uns keys to include as metadata (None includes all)

143

- ingest_mode: str, ingestion mode ("write" or "resume")

144

- registration_mapping: dict, mapping for registration information

145

- context: TileDB context for the operation

146

- platform_config: TileDB-specific configuration options

147

- additional_metadata: dict, additional metadata to store

148

149

Returns:

150

SOMA Experiment object

151

"""

152

```

153

154

#### to_h5ad

155

156

Write a SOMA Experiment directly to an H5AD file.

157

158

```python { .api }

159

def to_h5ad(experiment, h5ad_path, *, measurement_name="RNA", X_layer_name=None, obsm_layers=None, varm_layers=None, obsp_layers=None, varp_layers=None, obs_coords=None, var_coords=None, obs_value_filter=None, var_value_filter=None, obs_column_names=None, var_column_names=None, batch_size=None, context=None):

160

"""

161

Write a SOMA Experiment to an H5AD file.

162

163

Parameters:

164

- experiment: SOMA Experiment object or ExperimentAxisQuery

165

- h5ad_path: str, output H5AD file path

166

- measurement_name: str, name of measurement to write (default: "RNA")

167

- X_layer_name: str, X layer to use as main matrix (None uses first available)

168

- obsm_layers: list of str, obsm layers to include (None includes all)

169

- varm_layers: list of str, varm layers to include (None includes all)

170

- obsp_layers: list of str, obsp layers to include (None includes all)

171

- varp_layers: list of str, varp layers to include (None includes all)

172

- obs_coords: coordinates for observation selection

173

- var_coords: coordinates for variable selection

174

- obs_value_filter: str, filter expression for observations

175

- var_value_filter: str, filter expression for variables

176

- obs_column_names: list of str, observation columns to include

177

- var_column_names: list of str, variable columns to include

178

- batch_size: int, batch size for reading data

179

- context: TileDB context for the operation

180

"""

181

```

182

183

#### Usage Example

184

185

```python

186

import tiledbsoma.io as soma_io

187

188

# Convert H5AD file to SOMA format

189

soma_io.from_h5ad(

190

"input_data.h5ad",

191

"experiment.soma",

192

measurement_name="RNA"

193

)

194

195

# Process data in SOMA format

196

with tiledbsoma.open("experiment.soma") as exp:

197

# Perform analysis, filtering, etc.

198

query = exp.axis_query("RNA",

199

obs_query=tiledbsoma.AxisQuery(value_filter="n_genes > 500")

200

)

201

202

# Export filtered results back to H5AD

203

soma_io.to_h5ad(

204

query,

205

"filtered_output.h5ad",

206

measurement_name="RNA"

207

)

208

```

209

210

### Batch Registration

211

212

Functions for registering multiple AnnData objects or H5AD files into a single SOMA Experiment.

213

214

#### register_anndatas

215

216

Register multiple AnnData objects into a single SOMA Experiment with consistent indexing.

217

218

```python { .api }

219

def register_anndatas(experiment_uri, adatas, *, measurement_name="RNA", obs_id_name="obs_id", var_id_name="var_id", registration_mapping=None, context=None, platform_config=None):

220

"""

221

Register multiple AnnData objects into a SOMA Experiment.

222

223

Parameters:

224

- experiment_uri: str, URI of the SOMA experiment

225

- adatas: list of AnnData objects to register

226

- measurement_name: str, name for the measurement (default: "RNA")

227

- obs_id_name: str, column name for observation IDs (default: "obs_id")

228

- var_id_name: str, column name for variable IDs (default: "var_id")

229

- registration_mapping: dict, mapping for registration information

230

- context: TileDB context for the operation

231

- platform_config: TileDB-specific configuration options

232

233

Returns:

234

SOMA Experiment object

235

"""

236

```

237

238

#### register_h5ads

239

240

Register multiple H5AD files into a single SOMA Experiment.

241

242

```python { .api }

243

def register_h5ads(experiment_uri, h5ad_file_paths, *, measurement_name="RNA", obs_id_name="obs_id", var_id_name="var_id", registration_mapping=None, context=None, platform_config=None):

244

"""

245

Register multiple H5AD files into a SOMA Experiment.

246

247

Parameters:

248

- experiment_uri: str, URI of the SOMA experiment

249

- h5ad_file_paths: list of str, paths to H5AD files to register

250

- measurement_name: str, name for the measurement (default: "RNA")

251

- obs_id_name: str, column name for observation IDs (default: "obs_id")

252

- var_id_name: str, column name for variable IDs (default: "var_id")

253

- registration_mapping: dict, mapping for registration information

254

- context: TileDB context for the operation

255

- platform_config: TileDB-specific configuration options

256

257

Returns:

258

SOMA Experiment object

259

"""

260

```

261

262

#### Usage Example

263

264

```python

265

import tiledbsoma.io as soma_io

266

import scanpy as sc

267

268

# Load multiple datasets

269

pbmc_1k = sc.datasets.pbmc68k_reduced()[:1000, :].copy()

270

pbmc_2k = sc.datasets.pbmc68k_reduced()[1000:3000, :].copy()

271

272

# Register into single experiment

273

soma_io.register_anndatas(

274

"combined_experiment.soma",

275

[pbmc_1k, pbmc_2k],

276

measurement_name="RNA"

277

)

278

279

# Register H5AD files

280

h5ad_files = ["sample1.h5ad", "sample2.h5ad", "sample3.h5ad"]

281

soma_io.register_h5ads(

282

"multi_sample_experiment.soma",

283

h5ad_files,

284

measurement_name="RNA"

285

)

286

```

287

288

### Data Append and Update Operations

289

290

Functions for incrementally adding or modifying data in existing SOMA objects.

291

292

#### Append Functions

293

294

```python { .api }

295

def append_obs(soma_df, values, *, context=None, platform_config=None):

296

"""

297

Append observations to a SOMA DataFrame.

298

299

Parameters:

300

- soma_df: SOMA DataFrame to append to

301

- values: pyarrow.Table with new observation data

302

- context: TileDB context for the operation

303

- platform_config: TileDB-specific configuration options

304

"""

305

306

def append_var(soma_df, values, *, context=None, platform_config=None):

307

"""

308

Append variables to a SOMA DataFrame.

309

310

Parameters:

311

- soma_df: SOMA DataFrame to append to

312

- values: pyarrow.Table with new variable data

313

- context: TileDB context for the operation

314

- platform_config: TileDB-specific configuration options

315

"""

316

317

def append_X(collection, values, *, context=None, platform_config=None):

318

"""

319

Append expression data to an X collection.

320

321

Parameters:

322

- collection: SOMA Collection containing X matrices

323

- values: expression data to append

324

- context: TileDB context for the operation

325

- platform_config: TileDB-specific configuration options

326

"""

327

```

328

329

#### Update Functions

330

331

```python { .api }

332

def update_obs(soma_df, values, *, context=None, platform_config=None):

333

"""

334

Update observations in a SOMA DataFrame.

335

336

Parameters:

337

- soma_df: SOMA DataFrame to update

338

- values: pyarrow.Table with updated observation data

339

- context: TileDB context for the operation

340

- platform_config: TileDB-specific configuration options

341

"""

342

343

def update_var(soma_df, values, *, context=None, platform_config=None):

344

"""

345

Update variables in a SOMA DataFrame.

346

347

Parameters:

348

- soma_df: SOMA DataFrame to update

349

- values: pyarrow.Table with updated variable data

350

- context: TileDB context for the operation

351

- platform_config: TileDB-specific configuration options

352

"""

353

354

def update_matrix(soma_coll, values, *, context=None, platform_config=None):

355

"""

356

Update matrix data in a SOMA Collection.

357

358

Parameters:

359

- soma_coll: SOMA Collection containing matrices

360

- values: matrix data to update

361

- context: TileDB context for the operation

362

- platform_config: TileDB-specific configuration options

363

"""

364

```

365

366

#### Matrix Management Functions

367

368

```python { .api }

369

def add_matrix_to_collection(collection, matrix, layer_name, *, context=None, platform_config=None):

370

"""

371

Add a matrix to a SOMA Collection.

372

373

Parameters:

374

- collection: SOMA Collection to add matrix to

375

- matrix: matrix data to add

376

- layer_name: str, name for the new matrix layer

377

- context: TileDB context for the operation

378

- platform_config: TileDB-specific configuration options

379

"""

380

381

def add_X_layer(measurement, matrix, layer_name, *, context=None, platform_config=None):

382

"""

383

Add an X layer to a Measurement.

384

385

Parameters:

386

- measurement: SOMA Measurement object

387

- matrix: matrix data to add as X layer

388

- layer_name: str, name for the new X layer

389

- context: TileDB context for the operation

390

- platform_config: TileDB-specific configuration options

391

"""

392

393

def create_from_matrix(matrix, uri, *, context=None, platform_config=None):

394

"""

395

Create a SOMA array from a matrix.

396

397

Parameters:

398

- matrix: input matrix data

399

- uri: str, URI where SOMA array will be created

400

- context: TileDB context for the operation

401

- platform_config: TileDB-specific configuration options

402

403

Returns:

404

SOMA array object

405

"""

406

```

407

408

### Experiment Shaping Operations

409

410

Functions for managing and resizing SOMA Experiment dimensions.

411

412

```python { .api }

413

def get_experiment_shapes(experiment, *, measurement_name="RNA"):

414

"""

415

Get current shapes of experiment components.

416

417

Parameters:

418

- experiment: SOMA Experiment object

419

- measurement_name: str, name of measurement to analyze (default: "RNA")

420

421

Returns:

422

dict: Shapes of experiment components

423

"""

424

425

def show_experiment_shapes(experiment, *, measurement_name="RNA"):

426

"""

427

Display experiment component shapes.

428

429

Parameters:

430

- experiment: SOMA Experiment object

431

- measurement_name: str, name of measurement to analyze (default: "RNA")

432

"""

433

434

def resize_experiment(experiment, shape, *, measurement_name="RNA"):

435

"""

436

Resize experiment dimensions.

437

438

Parameters:

439

- experiment: SOMA Experiment object

440

- shape: new shape specification

441

- measurement_name: str, name of measurement to resize (default: "RNA")

442

"""

443

444

def upgrade_experiment_shapes(experiment, *, measurement_name="RNA"):

445

"""

446

Upgrade experiment shapes to accommodate new data.

447

448

Parameters:

449

- experiment: SOMA Experiment object

450

- measurement_name: str, name of measurement to upgrade (default: "RNA")

451

"""

452

```

453

454

### Registration Mapping

455

456

Support for mapping ambient labels during registration of multiple datasets.

457

458

```python { .api }

459

class ExperimentAmbientLabelMapping:

460

"""

461

Mapping for experiment ambient labels during registration.

462

463

Provides functionality for consistent labeling across multiple

464

datasets when registering them into a single experiment.

465

"""

466

```

467

468

#### Usage Example

469

470

```python

471

import tiledbsoma.io as soma_io

472

473

# Incremental data loading workflow

474

with tiledbsoma.open("experiment.soma", mode="w") as exp:

475

# Get current shapes

476

shapes = soma_io.get_experiment_shapes(exp, measurement_name="RNA")

477

print(f"Current shapes: {shapes}")

478

479

# Add new observations

480

new_obs_data = pa.table({

481

"soma_joinid": range(1000, 1100),

482

"cell_type": ["Macrophage"] * 100,

483

"sample_id": ["Sample3"] * 100

484

})

485

soma_io.append_obs(exp.obs, new_obs_data)

486

487

# Add corresponding expression data

488

# ... (prepare expression matrix for new cells)

489

490

# Resize experiment to accommodate new data

491

soma_io.upgrade_experiment_shapes(exp, measurement_name="RNA")

492

```

493

494

This comprehensive I/O functionality enables seamless integration between SOMA's scalable storage format and the existing single-cell analysis ecosystem.