or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

caching.mdcallbacks.mdcompression.mdcore-operations.mdfilesystem-interface.mdindex.mdmapping.mdregistry.mdutilities.md

index.mddocs/

0

# fsspec

1

2

A unified, pythonic interface specification for working with diverse file systems and storage backends. fsspec enables developers to write storage-agnostic code that works seamlessly across different backends (local, S3, GCS, Azure, HTTP, SSH, etc.) through a consistent API that abstracts away implementation-specific details.

3

4

## Package Information

5

6

- **Package Name**: fsspec

7

- **Language**: Python

8

- **Installation**: `pip install fsspec`

9

10

## Core Imports

11

12

```python

13

import fsspec

14

```

15

16

For direct filesystem instantiation:

17

18

```python

19

from fsspec import filesystem

20

```

21

22

For file opening and URL parsing:

23

24

```python

25

from fsspec import open, open_files, url_to_fs

26

```

27

28

For mapping interface:

29

30

```python

31

from fsspec import get_mapper, FSMap

32

```

33

34

## Basic Usage

35

36

```python

37

import fsspec

38

39

# Open a file from any supported backend

40

with fsspec.open('s3://bucket/file.txt', 'r') as f:

41

content = f.read()

42

43

# Open local file with same interface

44

with fsspec.open('/path/to/local/file.txt', 'r') as f:

45

content = f.read()

46

47

# Get filesystem instance directly

48

fs = fsspec.filesystem('s3', key='...', secret='...')

49

files = fs.ls('bucket/path/')

50

51

# Dictionary-like interface to filesystem

52

mapper = fsspec.get_mapper('s3://bucket/path/')

53

mapper['key'] = b'data'

54

data = mapper['key']

55

56

# Parse URL to get filesystem and path

57

fs, path = fsspec.url_to_fs('s3://bucket/file.txt')

58

```

59

60

## Architecture

61

62

fsspec uses a plugin-based architecture with consistent interfaces:

63

64

- **AbstractFileSystem**: Base class defining the common interface for all storage backends

65

- **Registry System**: Plugin system for discovering and instantiating filesystem implementations

66

- **Protocol Handlers**: Specific implementations for different storage types (local, S3, HTTP, etc.)

67

- **Caching Layer**: Multiple caching strategies for performance optimization

68

- **Transaction Support**: Atomic operations across filesystem boundaries

69

70

This design enables fsspec to serve as the foundation for major data processing libraries including Dask, pandas, xarray, and Hugging Face Datasets, providing unified storage access across the Python data ecosystem.

71

72

## Capabilities

73

74

### Core File Operations

75

76

Essential file and directory operations including opening files, reading/writing data, listing directories, and managing file metadata. These functions provide the primary interface for interacting with files across all supported storage backends.

77

78

```python { .api }

79

def open(urlpath, mode='rb', compression=None, encoding='utf8', errors=None, protocol=None, newline=None, expand=None, **kwargs): ...

80

def open_files(urlpath, mode='rb', compression=None, encoding='utf8', errors=None, name_function=None, num=1, protocol=None, newline=None, auto_mkdir=True, expand=True, **kwargs): ...

81

def open_local(url, mode='rb', **kwargs): ...

82

def url_to_fs(url, **kwargs): ...

83

def get_fs_token_paths(urls, mode='rb', num=1, name_function=None, **kwargs): ...

84

```

85

86

[Core Operations](./core-operations.md)

87

88

### Filesystem Registry

89

90

Plugin system for registering, discovering, and instantiating filesystem implementations. Enables dynamic loading of storage backend drivers and provides centralized access to available protocols.

91

92

```python { .api }

93

def filesystem(protocol, **storage_options): ...

94

def get_filesystem_class(protocol): ...

95

def register_implementation(name, cls, clobber=False, errtxt=None): ...

96

def available_protocols(): ...

97

```

98

99

[Registry System](./registry.md)

100

101

### AbstractFileSystem Interface

102

103

Complete filesystem abstraction providing consistent methods for file operations, directory management, and metadata access across all storage backends. Serves as the base class for all filesystem implementations.

104

105

```python { .api }

106

class AbstractFileSystem:

107

def open(self, path, mode='rb', **kwargs): ...

108

def ls(self, path, detail=True, **kwargs): ...

109

def info(self, path, **kwargs): ...

110

def exists(self, path, **kwargs): ...

111

def cat_file(self, path, start=None, end=None, **kwargs): ...

112

def pipe_file(self, path, value, **kwargs): ...

113

def copy(self, path1, path2, recursive=False, **kwargs): ...

114

def rm(self, path, recursive=False, maxdepth=None): ...

115

```

116

117

[AbstractFileSystem](./filesystem-interface.md)

118

119

### Dictionary Mapping Interface

120

121

Key-value store interface that presents filesystem paths as dictionary keys, enabling intuitive data access patterns and integration with mapping-based workflows.

122

123

```python { .api }

124

class FSMap:

125

def __getitem__(self, key): ...

126

def __setitem__(self, key, value): ...

127

def __delitem__(self, key): ...

128

def getitems(self, keys, on_error='raise'): ...

129

def setitems(self, d): ...

130

131

def get_mapper(url='', check=False, create=False, **kwargs): ...

132

```

133

134

[Mapping Interface](./mapping.md)

135

136

### Caching System

137

138

Multiple caching strategies for optimizing filesystem access patterns, including memory mapping, block caching, read-ahead caching, and background prefetching for improved performance with remote storage.

139

140

```python { .api }

141

class BaseCache: ...

142

class MMapCache(BaseCache): ...

143

class ReadAheadCache(BaseCache): ...

144

class BlockCache(BaseCache): ...

145

class BytesCache(BaseCache): ...

146

class BackgroundBlockCache(BaseCache): ...

147

```

148

149

[Caching](./caching.md)

150

151

### Progress Callbacks

152

153

Extensible callback system for monitoring file transfer progress, supporting both built-in progress indicators and custom callback implementations for long-running operations.

154

155

```python { .api }

156

class Callback:

157

def __call__(self, size_or_none=None, value_or_none=None): ...

158

def set_size(self, size): ...

159

def relative_update(self, inc=1): ...

160

def branched(self, path_1, path_2, **kwargs): ...

161

162

class TqdmCallback(Callback): ...

163

class DotPrinterCallback(Callback): ...

164

```

165

166

[Callbacks](./callbacks.md)

167

168

### Compression Support

169

170

Automatic compression/decompression support for multiple formats, enabling transparent handling of compressed files across all filesystem backends.

171

172

```python { .api }

173

def register_compression(name, callback, extensions, force=False): ...

174

def available_compressions(): ...

175

```

176

177

[Compression](./compression.md)

178

179

### Utilities and Configuration

180

181

Helper functions for URL parsing, path manipulation, tokenization, and configuration management that support the core filesystem operations.

182

183

```python { .api }

184

def infer_storage_options(urlpath, inherit_storage_options=None): ...

185

def infer_compression(filename): ...

186

def stringify_path(filepath): ...

187

def tokenize(*args, **kwargs): ...

188

def get_protocol(url): ...

189

```

190

191

[Utilities](./utilities.md)

192

193

## Types

194

195

```python { .api }

196

class FSTimeoutError(Exception):

197

"""Timeout in filesystem operations"""

198

199

class BlocksizeMismatchError(ValueError):

200

"""Raised when cached file is reopened with different blocksize than original"""

201

202

class OpenFile:

203

"""File-like object for deferred opening"""

204

def __init__(self, fs, path, mode='rb', compression=None, encoding=None, errors=None, newline=None): ...

205

def __enter__(self): ...

206

def __exit__(self, *args): ...

207

def open(self): ...

208

209

class Transaction:

210

"""Filesystem transaction context"""

211

def __init__(self, fs, **kwargs): ...

212

def __enter__(self): ...

213

def __exit__(self, exc_type, exc_val, exc_tb): ...

214

def start(self): ...

215

def complete(self, commit=True): ...

216

```