0
# Core File Operations
1
2
Essential file and directory operations that provide the primary interface for interacting with files across all supported storage backends. These functions handle URL parsing, protocol resolution, and file opening with support for compression, encoding, and various access patterns.
3
4
## Capabilities
5
6
### File Opening
7
8
Opens single files with automatic protocol detection, compression handling, and encoding support. Returns a file-like object that can be used with context managers.
9
10
```python { .api }
11
def open(urlpath, mode='rb', compression=None, encoding='utf8', errors=None, protocol=None, newline=None, expand=None, **kwargs):
12
"""
13
Open a file for reading or writing.
14
15
Parameters:
16
- urlpath: str, URL or path to file (supports all registered protocols)
17
- mode: str, file opening mode ('r', 'w', 'a', 'rb', 'wb', etc.)
18
- compression: str or None, compression format ('gzip', 'bz2', 'lzma', etc.)
19
- encoding: str, text encoding for text mode (default 'utf8')
20
- errors: str or None, error handling mode for text encoding
21
- protocol: str or None, force specific protocol
22
- newline: str or None, newline handling for text mode
23
- expand: bool or None, expand glob patterns in paths
24
- **kwargs: additional options passed to filesystem
25
26
Returns:
27
OpenFile object (context manager)
28
"""
29
```
30
31
Usage example:
32
```python
33
# Open remote file with compression
34
with fsspec.open('s3://bucket/data.txt.gz', 'rt', compression='gzip') as f:
35
content = f.read()
36
37
# Open local file
38
with fsspec.open('/path/to/file.json', 'r') as f:
39
data = json.load(f)
40
```
41
42
### Multiple File Opening
43
44
Opens multiple files simultaneously, supporting glob patterns and parallel access. Useful for batch processing of file collections.
45
46
```python { .api }
47
def open_files(urlpath, mode='rb', compression=None, encoding='utf8', errors=None, name_function=None, num=1, protocol=None, newline=None, auto_mkdir=True, expand=True, **kwargs):
48
"""
49
Open multiple files for reading or writing.
50
51
Parameters:
52
- urlpath: str or list, URL pattern or list of URLs
53
- mode: str, file opening mode
54
- compression: str or None, compression format
55
- encoding: str, text encoding for text mode
56
- errors: str or None, error handling mode for text encoding
57
- name_function: callable, function to generate filenames for num > 1
58
- num: int, number of files to create for write operations
59
- protocol: str or None, force specific protocol
60
- newline: str or None, newline handling for text mode
61
- auto_mkdir: bool, automatically create parent directories
62
- expand: bool, expand glob patterns in paths
63
- **kwargs: additional options passed to filesystem
64
65
Returns:
66
List of OpenFile objects
67
"""
68
```
69
70
Usage example:
71
```python
72
# Open multiple files matching pattern
73
files = fsspec.open_files('s3://bucket/data/*.csv', 'rt')
74
for f in files:
75
with f as file:
76
df = pd.read_csv(file)
77
78
# Create multiple output files
79
outputs = fsspec.open_files('output-*.json', 'w', num=4)
80
```
81
82
### Local File Access
83
84
Ensures files are available locally, downloading remote files to temporary locations if necessary. Returns the local path for direct access.
85
86
```python { .api }
87
def open_local(url, mode='rb', **kwargs):
88
"""
89
Open a file ensuring it's available locally.
90
91
Parameters:
92
- url: str, URL or path to file
93
- mode: str, file opening mode
94
- **kwargs: additional options passed to filesystem
95
96
Returns:
97
str, local file path
98
"""
99
```
100
101
Usage example:
102
```python
103
# Ensure remote file is available locally
104
local_path = fsspec.open_local('s3://bucket/model.pkl')
105
with open(local_path, 'rb') as f:
106
model = pickle.load(f)
107
```
108
109
### URL to Filesystem Resolution
110
111
Parses URLs to extract the appropriate filesystem instance and normalized path. Core function for protocol resolution and filesystem instantiation.
112
113
```python { .api }
114
def url_to_fs(url, **kwargs):
115
"""
116
Parse URL and return filesystem instance and path.
117
118
Parameters:
119
- url: str, URL to parse
120
- **kwargs: storage options passed to filesystem constructor
121
122
Returns:
123
tuple: (AbstractFileSystem instance, str path)
124
"""
125
```
126
127
Usage example:
128
```python
129
# Parse S3 URL
130
fs, path = fsspec.url_to_fs('s3://bucket/path/file.txt', key='...', secret='...')
131
files = fs.ls(path.rsplit('/', 1)[0]) # List directory
132
133
# Parse HTTP URL
134
fs, path = fsspec.url_to_fs('https://example.com/data.csv')
135
content = fs.cat_file(path)
136
```
137
138
### Multiple URL Processing
139
140
Processes multiple URLs and paths, returning a single filesystem instance and list of paths. Optimizes for cases where multiple files share the same storage backend.
141
142
```python { .api }
143
def get_fs_token_paths(urls, mode='rb', num=1, name_function=None, **kwargs):
144
"""
145
Parse multiple URLs and return filesystem with paths.
146
147
Parameters:
148
- urls: str or list, URLs or paths to process
149
- mode: str, file opening mode
150
- num: int, number of files to create for write operations
151
- name_function: callable, function to generate filenames
152
- **kwargs: storage options passed to filesystem constructor
153
154
Returns:
155
tuple: (AbstractFileSystem instance, str token, list of paths)
156
"""
157
```
158
159
Usage example:
160
```python
161
# Process multiple S3 files
162
fs, token, paths = fsspec.get_fs_token_paths([
163
's3://bucket/file1.txt',
164
's3://bucket/file2.txt'
165
], key='...', secret='...')
166
167
# Read all files
168
contents = [fs.cat_file(path) for path in paths]
169
```
170
171
## Usage Patterns
172
173
### Context Manager Pattern
174
175
The preferred way to work with fsspec files:
176
177
```python
178
with fsspec.open('protocol://path/file.ext', 'r') as f:
179
data = f.read()
180
```
181
182
### Batch Processing
183
184
Processing multiple files efficiently:
185
186
```python
187
files = fsspec.open_files('s3://bucket/data/*.parquet')
188
datasets = []
189
for f in files:
190
with f as file:
191
datasets.append(pd.read_parquet(file))
192
```
193
194
### Protocol Auto-Detection
195
196
fsspec automatically detects protocols from URLs:
197
198
```python
199
# These all work with the same interface
200
fsspec.open('file:///local/path.txt') # Local filesystem
201
fsspec.open('/local/path.txt') # Local filesystem (implicit)
202
fsspec.open('s3://bucket/file.txt') # Amazon S3
203
fsspec.open('gcs://bucket/file.txt') # Google Cloud Storage
204
fsspec.open('https://example.com/api') # HTTP
205
```
206
207
### Compression Handling
208
209
Automatic compression based on file extensions or explicit specification:
210
211
```python
212
# Auto-detect compression from extension
213
with fsspec.open('data.csv.gz', 'rt') as f:
214
content = f.read()
215
216
# Explicit compression
217
with fsspec.open('data.csv', 'rt', compression='gzip') as f:
218
content = f.read()
219
```