or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

analysis-tools.mddata-io.mddatasets.mdexternal-tools.mdindex.mdpreprocessing.mdqueries.mdspatial-analysis.mdutilities.mdvisualization.md

data-io.mddocs/

0

# Data Input/Output

1

2

Scanpy provides comprehensive support for reading and writing various single-cell data formats, making it easy to work with data from different platforms and integrate with other analysis tools.

3

4

## Capabilities

5

6

### General Data Reading

7

8

Read various file formats and automatically detect the appropriate format based on file extension.

9

10

```python { .api }

11

def read(filename, delimiter=None, first_column_names=None, backup_url=None, sheet=None, ext=None, **kwargs):

12

"""

13

Read file and return AnnData object.

14

15

Parameters:

16

- filename (str): Path to file or URL

17

- delimiter (str, optional): Delimiter for text files

18

- first_column_names (bool, optional): Whether first column contains row names

19

- backup_url (str, optional): Backup URL if file not found locally

20

- sheet (str, optional): Sheet name for Excel files

21

- ext (str, optional): Force file extension interpretation

22

23

Returns:

24

AnnData: Annotated data object

25

"""

26

```

27

28

### 10x Genomics Formats

29

30

Read data from 10x Genomics Cell Ranger output formats, the most common single-cell data format.

31

32

```python { .api }

33

def read_10x_h5(filename, genome=None, gex_only=True, **kwargs):

34

"""

35

Read 10x Genomics HDF5 file.

36

37

Parameters:

38

- filename (str): Path to .h5 file

39

- genome (str, optional): Genome to read (for multi-genome files)

40

- gex_only (bool): Only read gene expression data

41

42

Returns:

43

AnnData: Annotated data object

44

"""

45

46

def read_10x_mtx(path, var_names='gene_symbols', make_unique=True, cache=False, **kwargs):

47

"""

48

Read 10x Genomics MTX format (matrix.mtx, features.tsv, barcodes.tsv).

49

50

Parameters:

51

- path (str): Path to directory containing MTX files

52

- var_names (str): Use 'gene_symbols' or 'gene_ids' for gene names

53

- make_unique (bool): Make gene names unique

54

- cache (bool): Write cache file for faster subsequent reading

55

56

Returns:

57

AnnData: Annotated data object

58

"""

59

```

60

61

### Spatial Transcriptomics

62

63

Read spatial transcriptomics data from 10x Visium platform.

64

65

```python { .api }

66

def read_visium(path, genome=None, count_file='filtered_feature_bc_matrix.h5', library_id=None, load_images=True, **kwargs):

67

"""

68

Read 10x Visium spatial transcriptomics data.

69

70

Parameters:

71

- path (str): Path to directory containing Visium output

72

- genome (str, optional): Genome to read

73

- count_file (str): Name of count matrix file

74

- library_id (str, optional): Library identifier

75

- load_images (bool): Load histological images

76

77

Returns:

78

AnnData: Annotated data object with spatial coordinates

79

"""

80

```

81

82

### Standard Formats

83

84

Read common data formats used in bioinformatics and data science.

85

86

```python { .api }

87

# From anndata - automatically available in scanpy

88

def read_csv(filename, delimiter=',', first_column_names=None, **kwargs):

89

"""

90

Read CSV file.

91

92

Parameters:

93

- filename (str): Path to CSV file

94

- delimiter (str): Field delimiter

95

- first_column_names (bool, optional): First column contains row names

96

97

Returns:

98

AnnData: Annotated data object

99

"""

100

101

def read_excel(filename, sheet=None, **kwargs):

102

"""

103

Read Excel file.

104

105

Parameters:

106

- filename (str): Path to Excel file

107

- sheet (str, optional): Sheet name to read

108

109

Returns:

110

AnnData: Annotated data object

111

"""

112

113

def read_h5ad(filename, backed=None, **kwargs):

114

"""

115

Read H5AD format (native AnnData format).

116

117

Parameters:

118

- filename (str): Path to .h5ad file

119

- backed (str, optional): Backing mode ('r' for read-only)

120

121

Returns:

122

AnnData: Annotated data object

123

"""

124

125

def read_hdf(filename, key, **kwargs):

126

"""

127

Read HDF5 file.

128

129

Parameters:

130

- filename (str): Path to HDF5 file

131

- key (str): Key/group name in HDF5 file

132

133

Returns:

134

AnnData: Annotated data object

135

"""

136

137

def read_loom(filename, sparse=True, cleanup=True, **kwargs):

138

"""

139

Read Loom file format.

140

141

Parameters:

142

- filename (str): Path to .loom file

143

- sparse (bool): Store matrix in sparse format

144

- cleanup (bool): Clean up temporary files

145

146

Returns:

147

AnnData: Annotated data object

148

"""

149

150

def read_mtx(filename, **kwargs):

151

"""

152

Read Matrix Market format.

153

154

Parameters:

155

- filename (str): Path to .mtx file

156

157

Returns:

158

AnnData: Annotated data object

159

"""

160

161

def read_text(filename, delimiter=None, first_column_names=None, **kwargs):

162

"""

163

Read text file.

164

165

Parameters:

166

- filename (str): Path to text file

167

- delimiter (str, optional): Field delimiter

168

- first_column_names (bool, optional): First column contains row names

169

170

Returns:

171

AnnData: Annotated data object

172

"""

173

174

def read_umi_tools(filename, **kwargs):

175

"""

176

Read UMI-tools format.

177

178

Parameters:

179

- filename (str): Path to UMI-tools output file

180

181

Returns:

182

AnnData: Annotated data object

183

"""

184

```

185

186

### Data Writing

187

188

Write AnnData objects to various formats for sharing, archiving, or use with other tools.

189

190

```python { .api }

191

def write(filename, adata, ext=None, compression=None, compression_opts=None):

192

"""

193

Write AnnData object to file.

194

195

Parameters:

196

- filename (str): Output file path

197

- adata (AnnData): AnnData object to write

198

- ext (str, optional): Force file format based on extension

199

- compression (str, optional): Compression method

200

- compression_opts (dict, optional): Compression options

201

"""

202

```

203

204

### Data Concatenation

205

206

Combine multiple AnnData objects into a single object.

207

208

```python { .api }

209

def concat(adatas, axis=0, join='outer', merge=None, uns_merge=None, **kwargs):

210

"""

211

Concatenate AnnData objects along an axis.

212

213

Parameters:

214

- adatas (list): List of AnnData objects to concatenate

215

- axis (int): Axis along which to concatenate (0 for observations, 1 for variables)

216

- join (str): How to handle indices ('outer', 'inner')

217

- merge (str, optional): Strategy for merging conflicting annotations

218

- uns_merge (str, optional): Strategy for merging unstructured annotations

219

220

Returns:

221

AnnData: Concatenated AnnData object

222

"""

223

```

224

225

## Usage Examples

226

227

### Loading 10x Genomics Data

228

229

```python

230

import scanpy as sc

231

232

# Load 10x MTX format

233

adata = sc.read_10x_mtx(

234

'data/filtered_gene_bc_matrices/hg19/',

235

var_names='gene_symbols',

236

cache=True

237

)

238

adata.var_names_unique()

239

240

# Load 10x H5 format

241

adata = sc.read_10x_h5('data/filtered_gene_bc_matrix.h5')

242

```

243

244

### Loading Spatial Data

245

246

```python

247

# Load Visium spatial transcriptomics data

248

adata = sc.read_visium('data/spatial/')

249

adata.var_names_unique()

250

251

# Spatial coordinates are stored in adata.obsm['spatial']

252

print(adata.obsm['spatial'].shape)

253

```

254

255

### Saving and Loading Analysis Results

256

257

```python

258

# Save processed data

259

sc.write('results/processed_data.h5ad', adata)

260

261

# Load for further analysis

262

adata = sc.read_h5ad('results/processed_data.h5ad')

263

```

264

265

### Working with Multiple Datasets

266

267

```python

268

# Load multiple datasets

269

adata1 = sc.read_10x_mtx('data/sample1/')

270

adata2 = sc.read_10x_mtx('data/sample2/')

271

272

# Add batch information

273

adata1.obs['batch'] = 'sample1'

274

adata2.obs['batch'] = 'sample2'

275

276

# Concatenate datasets

277

adata_combined = sc.concat([adata1, adata2], join='outer')

278

```