0
# Data Input/Output
1
2
Scanpy provides comprehensive support for reading and writing various single-cell data formats, making it easy to work with data from different platforms and integrate with other analysis tools.
3
4
## Capabilities
5
6
### General Data Reading
7
8
Read various file formats and automatically detect the appropriate format based on file extension.
9
10
```python { .api }
11
def read(filename, delimiter=None, first_column_names=None, backup_url=None, sheet=None, ext=None, **kwargs):
12
"""
13
Read file and return AnnData object.
14
15
Parameters:
16
- filename (str): Path to file or URL
17
- delimiter (str, optional): Delimiter for text files
18
- first_column_names (bool, optional): Whether first column contains row names
19
- backup_url (str, optional): Backup URL if file not found locally
20
- sheet (str, optional): Sheet name for Excel files
21
- ext (str, optional): Force file extension interpretation
22
23
Returns:
24
AnnData: Annotated data object
25
"""
26
```
27
28
### 10x Genomics Formats
29
30
Read data from 10x Genomics Cell Ranger output formats, the most common single-cell data format.
31
32
```python { .api }
33
def read_10x_h5(filename, genome=None, gex_only=True, **kwargs):
34
"""
35
Read 10x Genomics HDF5 file.
36
37
Parameters:
38
- filename (str): Path to .h5 file
39
- genome (str, optional): Genome to read (for multi-genome files)
40
- gex_only (bool): Only read gene expression data
41
42
Returns:
43
AnnData: Annotated data object
44
"""
45
46
def read_10x_mtx(path, var_names='gene_symbols', make_unique=True, cache=False, **kwargs):
47
"""
48
Read 10x Genomics MTX format (matrix.mtx, features.tsv, barcodes.tsv).
49
50
Parameters:
51
- path (str): Path to directory containing MTX files
52
- var_names (str): Use 'gene_symbols' or 'gene_ids' for gene names
53
- make_unique (bool): Make gene names unique
54
- cache (bool): Write cache file for faster subsequent reading
55
56
Returns:
57
AnnData: Annotated data object
58
"""
59
```
60
61
### Spatial Transcriptomics
62
63
Read spatial transcriptomics data from 10x Visium platform.
64
65
```python { .api }
66
def read_visium(path, genome=None, count_file='filtered_feature_bc_matrix.h5', library_id=None, load_images=True, **kwargs):
67
"""
68
Read 10x Visium spatial transcriptomics data.
69
70
Parameters:
71
- path (str): Path to directory containing Visium output
72
- genome (str, optional): Genome to read
73
- count_file (str): Name of count matrix file
74
- library_id (str, optional): Library identifier
75
- load_images (bool): Load histological images
76
77
Returns:
78
AnnData: Annotated data object with spatial coordinates
79
"""
80
```
81
82
### Standard Formats
83
84
Read common data formats used in bioinformatics and data science.
85
86
```python { .api }
87
# From anndata - automatically available in scanpy
88
def read_csv(filename, delimiter=',', first_column_names=None, **kwargs):
89
"""
90
Read CSV file.
91
92
Parameters:
93
- filename (str): Path to CSV file
94
- delimiter (str): Field delimiter
95
- first_column_names (bool, optional): First column contains row names
96
97
Returns:
98
AnnData: Annotated data object
99
"""
100
101
def read_excel(filename, sheet=None, **kwargs):
102
"""
103
Read Excel file.
104
105
Parameters:
106
- filename (str): Path to Excel file
107
- sheet (str, optional): Sheet name to read
108
109
Returns:
110
AnnData: Annotated data object
111
"""
112
113
def read_h5ad(filename, backed=None, **kwargs):
114
"""
115
Read H5AD format (native AnnData format).
116
117
Parameters:
118
- filename (str): Path to .h5ad file
119
- backed (str, optional): Backing mode ('r' for read-only)
120
121
Returns:
122
AnnData: Annotated data object
123
"""
124
125
def read_hdf(filename, key, **kwargs):
126
"""
127
Read HDF5 file.
128
129
Parameters:
130
- filename (str): Path to HDF5 file
131
- key (str): Key/group name in HDF5 file
132
133
Returns:
134
AnnData: Annotated data object
135
"""
136
137
def read_loom(filename, sparse=True, cleanup=True, **kwargs):
138
"""
139
Read Loom file format.
140
141
Parameters:
142
- filename (str): Path to .loom file
143
- sparse (bool): Store matrix in sparse format
144
- cleanup (bool): Clean up temporary files
145
146
Returns:
147
AnnData: Annotated data object
148
"""
149
150
def read_mtx(filename, **kwargs):
151
"""
152
Read Matrix Market format.
153
154
Parameters:
155
- filename (str): Path to .mtx file
156
157
Returns:
158
AnnData: Annotated data object
159
"""
160
161
def read_text(filename, delimiter=None, first_column_names=None, **kwargs):
162
"""
163
Read text file.
164
165
Parameters:
166
- filename (str): Path to text file
167
- delimiter (str, optional): Field delimiter
168
- first_column_names (bool, optional): First column contains row names
169
170
Returns:
171
AnnData: Annotated data object
172
"""
173
174
def read_umi_tools(filename, **kwargs):
175
"""
176
Read UMI-tools format.
177
178
Parameters:
179
- filename (str): Path to UMI-tools output file
180
181
Returns:
182
AnnData: Annotated data object
183
"""
184
```
185
186
### Data Writing
187
188
Write AnnData objects to various formats for sharing, archiving, or use with other tools.
189
190
```python { .api }
191
def write(filename, adata, ext=None, compression=None, compression_opts=None):
192
"""
193
Write AnnData object to file.
194
195
Parameters:
196
- filename (str): Output file path
197
- adata (AnnData): AnnData object to write
198
- ext (str, optional): Force file format based on extension
199
- compression (str, optional): Compression method
200
- compression_opts (dict, optional): Compression options
201
"""
202
```
203
204
### Data Concatenation
205
206
Combine multiple AnnData objects into a single object.
207
208
```python { .api }
209
def concat(adatas, axis=0, join='outer', merge=None, uns_merge=None, **kwargs):
210
"""
211
Concatenate AnnData objects along an axis.
212
213
Parameters:
214
- adatas (list): List of AnnData objects to concatenate
215
- axis (int): Axis along which to concatenate (0 for observations, 1 for variables)
216
- join (str): How to handle indices ('outer', 'inner')
217
- merge (str, optional): Strategy for merging conflicting annotations
218
- uns_merge (str, optional): Strategy for merging unstructured annotations
219
220
Returns:
221
AnnData: Concatenated AnnData object
222
"""
223
```
224
225
## Usage Examples
226
227
### Loading 10x Genomics Data
228
229
```python
230
import scanpy as sc
231
232
# Load 10x MTX format
233
adata = sc.read_10x_mtx(
234
'data/filtered_gene_bc_matrices/hg19/',
235
var_names='gene_symbols',
236
cache=True
237
)
238
adata.var_names_unique()
239
240
# Load 10x H5 format
241
adata = sc.read_10x_h5('data/filtered_gene_bc_matrix.h5')
242
```
243
244
### Loading Spatial Data
245
246
```python
247
# Load Visium spatial transcriptomics data
248
adata = sc.read_visium('data/spatial/')
249
adata.var_names_unique()
250
251
# Spatial coordinates are stored in adata.obsm['spatial']
252
print(adata.obsm['spatial'].shape)
253
```
254
255
### Saving and Loading Analysis Results
256
257
```python
258
# Save processed data
259
sc.write('results/processed_data.h5ad', adata)
260
261
# Load for further analysis
262
adata = sc.read_h5ad('results/processed_data.h5ad')
263
```
264
265
### Working with Multiple Datasets
266
267
```python
268
# Load multiple datasets
269
adata1 = sc.read_10x_mtx('data/sample1/')
270
adata2 = sc.read_10x_mtx('data/sample2/')
271
272
# Add batch information
273
adata1.obs['batch'] = 'sample1'
274
adata2.obs['batch'] = 'sample2'
275
276
# Concatenate datasets
277
adata_combined = sc.concat([adata1, adata2], join='outer')
278
```