0
# CloudPathLib
1
2
A comprehensive Python library that extends pathlib functionality to work seamlessly with cloud storage services. CloudPathLib provides a familiar pathlib-like interface for cloud URIs, enabling developers to read, write, and manipulate files in cloud storage using the same intuitive patterns they use for local filesystem operations.
3
4
## Package Information
5
6
- **Package Name**: cloudpathlib
7
- **Language**: Python
8
- **Installation**: `pip install cloudpathlib[all]` (or specify cloud providers: `[s3,gs,azure]`)
9
10
## Core Imports
11
12
```python
13
from cloudpathlib import CloudPath, AnyPath, implementation_registry
14
```
15
16
For specific cloud providers:
17
18
```python
19
from cloudpathlib import S3Path, GSPath, AzureBlobPath
20
from cloudpathlib import S3Client, GSClient, AzureBlobClient
21
```
22
23
## Basic Usage
24
25
```python
26
from cloudpathlib import CloudPath
27
28
# Works with any supported cloud service
29
# Automatically dispatches to appropriate implementation based on URI prefix
30
s3_path = CloudPath("s3://my-bucket/file.txt")
31
gs_path = CloudPath("gs://my-bucket/file.txt")
32
azure_path = CloudPath("az://my-container/file.txt")
33
34
# Familiar pathlib-style operations
35
with s3_path.open("w") as f:
36
f.write("Hello cloud storage!")
37
38
# Read content
39
content = s3_path.read_text()
40
41
# Path operations
42
parent = s3_path.parent
43
filename = s3_path.name
44
new_path = s3_path / "subdirectory" / "another_file.txt"
45
46
# Directory operations
47
s3_path.parent.mkdir(parents=True, exist_ok=True)
48
for item in s3_path.parent.iterdir():
49
print(item)
50
51
# Pattern matching
52
for txt_file in CloudPath("s3://my-bucket/").glob("**/*.txt"):
53
print(txt_file)
54
```
55
56
## Architecture
57
58
CloudPathLib follows a modular architecture with three key layers:
59
60
- **Path Classes**: Cloud-specific implementations (S3Path, GSPath, AzureBlobPath) that inherit from the abstract CloudPath base class, providing pathlib-compatible interfaces
61
- **Client Classes**: Handle cloud service authentication and operations (S3Client, GSClient, AzureBlobClient) with configurable caching and connection options
62
- **Universal Handlers**: AnyPath for automatic dispatch and CloudPath base class for common functionality across all cloud providers
63
64
This design enables seamless switching between cloud providers, local filesystem implementations for testing, and integration with existing pathlib-based code through familiar interfaces.
65
66
## Capabilities
67
68
### Core Path Operations
69
70
Essential pathlib-compatible operations for working with cloud storage paths, including path construction, manipulation, and filesystem operations like reading, writing, and directory management.
71
72
```python { .api }
73
class CloudPath:
74
def __init__(self, cloud_path: str, *parts: str, client=None): ...
75
def __truediv__(self, other: str) -> "CloudPath": ...
76
def joinpath(self, *pathsegments: str) -> "CloudPath": ...
77
def with_name(self, name: str) -> "CloudPath": ...
78
def with_suffix(self, suffix: str) -> "CloudPath": ...
79
def with_stem(self, stem: str) -> "CloudPath": ...
80
81
@property
82
def name(self) -> str: ...
83
@property
84
def stem(self) -> str: ...
85
@property
86
def suffix(self) -> str: ...
87
@property
88
def parent(self) -> "CloudPath": ...
89
@property
90
def parts(self) -> tuple: ...
91
```
92
93
[Core Path Operations](./core-operations.md)
94
95
### File I/O Operations
96
97
Comprehensive file input/output capabilities with support for text and binary modes, streaming, and cloud-specific optimizations for efficient data transfer.
98
99
```python { .api }
100
def open(
101
self,
102
mode: str = "r",
103
buffering: int = -1,
104
encoding: typing.Optional[str] = None,
105
errors: typing.Optional[str] = None,
106
newline: typing.Optional[str] = None,
107
force_overwrite_from_cloud: typing.Optional[bool] = None,
108
force_overwrite_to_cloud: typing.Optional[bool] = None
109
) -> typing.IO: ...
110
def read_text(self, encoding: str = None, errors: str = None) -> str: ...
111
def read_bytes(self) -> bytes: ...
112
def write_text(self, data: str, encoding: str = None, errors: str = None) -> int: ...
113
def write_bytes(self, data: bytes) -> int: ...
114
```
115
116
[File I/O Operations](./file-io.md)
117
118
### Directory Operations
119
120
Directory management including creation, deletion, listing, traversal, and pattern matching with glob support for recursive searches.
121
122
```python { .api }
123
def exists(self) -> bool: ...
124
def is_file(self) -> bool: ...
125
def is_dir(self) -> bool: ...
126
def iterdir(self) -> typing.Iterator["CloudPath"]: ...
127
def mkdir(self, parents: bool = False, exist_ok: bool = False) -> None: ...
128
def rmdir(self) -> None: ...
129
def rmtree(self) -> None: ...
130
def glob(self, pattern: str) -> typing.Iterator["CloudPath"]: ...
131
def rglob(self, pattern: str) -> typing.Iterator["CloudPath"]: ...
132
def walk(self, top_down: bool = True) -> typing.Iterator[tuple]: ...
133
```
134
135
[Directory Operations](./directory-operations.md)
136
137
### Cloud-Specific Operations
138
139
Advanced cloud storage features including URL generation, presigned URLs, file copying, upload/download, caching, and cloud service metadata access.
140
141
```python { .api }
142
def download_to(self, destination: typing.Union[str, "os.PathLike"]) -> "pathlib.Path": ...
143
def upload_from(self, source: typing.Union[str, "os.PathLike"]) -> "CloudPath": ...
144
def copy(self, destination: typing.Union[str, "CloudPath"]) -> "CloudPath": ...
145
def copytree(self, destination: typing.Union[str, "CloudPath"]) -> "CloudPath": ...
146
def as_url(self, presign: bool = False, expire_seconds: int = 3600) -> str: ...
147
def clear_cache(self) -> None: ...
148
def stat(self) -> "os.stat_result": ...
149
```
150
151
[Cloud-Specific Operations](./cloud-operations.md)
152
153
### AWS S3 Integration
154
155
Complete AWS S3 support with advanced features including multipart uploads, transfer acceleration, custom endpoints, and S3-specific metadata access.
156
157
```python { .api }
158
class S3Path(CloudPath):
159
@property
160
def bucket(self) -> str: ...
161
@property
162
def key(self) -> str: ...
163
@property
164
def etag(self) -> str: ...
165
166
class S3Client:
167
def __init__(
168
self,
169
aws_access_key_id: str = None,
170
aws_secret_access_key: str = None,
171
aws_session_token: str = None,
172
profile_name: str = None,
173
boto3_session = None,
174
**kwargs
175
): ...
176
```
177
178
[AWS S3 Integration](./s3-integration.md)
179
180
### Google Cloud Storage Integration
181
182
Full Google Cloud Storage support with service account authentication, custom retry policies, concurrent downloads, and GCS-specific features.
183
184
```python { .api }
185
class GSPath(CloudPath):
186
@property
187
def bucket(self) -> str: ...
188
@property
189
def blob(self) -> str: ...
190
@property
191
def etag(self) -> str: ...
192
193
class GSClient:
194
def __init__(
195
self,
196
credentials = None,
197
project: str = None,
198
storage_client = None,
199
**kwargs
200
): ...
201
```
202
203
[Google Cloud Storage Integration](./gcs-integration.md)
204
205
### Azure Blob Storage Integration
206
207
Azure Blob Storage support with Azure Active Directory authentication, hierarchical namespace support for ADLS Gen2, and Azure-specific blob operations.
208
209
```python { .api }
210
class AzureBlobPath(CloudPath):
211
@property
212
def container(self) -> str: ...
213
@property
214
def blob(self) -> str: ...
215
@property
216
def etag(self) -> str: ...
217
218
class AzureBlobClient:
219
def __init__(
220
self,
221
account_url: str = None,
222
credential = None,
223
connection_string: str = None,
224
**kwargs
225
): ...
226
```
227
228
[Azure Blob Storage Integration](./azure-integration.md)
229
230
### HTTP/HTTPS Support
231
232
HTTP and HTTPS resource access with custom authentication, directory listing parsers, and RESTful operations for web-based storage systems.
233
234
```python { .api }
235
class HttpPath(CloudPath):
236
def get(self, **kwargs) -> typing.Tuple["http.client.HTTPResponse", bytes]: ...
237
def put(self, **kwargs) -> typing.Tuple["http.client.HTTPResponse", bytes]: ...
238
def post(self, **kwargs) -> typing.Tuple["http.client.HTTPResponse", bytes]: ...
239
def delete(self, **kwargs) -> typing.Tuple["http.client.HTTPResponse", bytes]: ...
240
def head(self, **kwargs) -> typing.Tuple["http.client.HTTPResponse", bytes]: ...
241
242
class HttpClient:
243
def __init__(
244
self,
245
auth = None,
246
custom_list_page_parser = None,
247
custom_dir_matcher = None,
248
**kwargs
249
): ...
250
```
251
252
[HTTP/HTTPS Support](./http-support.md)
253
254
### Universal Path Handling
255
256
AnyPath provides intelligent dispatching between cloud paths and local filesystem paths, enabling code that works seamlessly with both local and cloud storage.
257
258
```python { .api }
259
class AnyPath:
260
def __new__(cls, *args, **kwargs) -> typing.Union[CloudPath, "pathlib.Path"]: ...
261
@classmethod
262
def validate(cls, v): ...
263
264
def to_anypath(s: typing.Union[str, "os.PathLike"]) -> typing.Union[CloudPath, "pathlib.Path"]: ...
265
```
266
267
[Universal Path Handling](./anypath.md)
268
269
### Standard Library Integration
270
271
Monkey patching capabilities to make Python's built-in functions work transparently with cloud paths, including `open()`, `os` functions, and `glob` operations.
272
273
```python { .api }
274
def patch_open(original_open = None) -> None: ...
275
def patch_os_functions() -> None: ...
276
def patch_glob() -> None: ...
277
def patch_all_builtins() -> None: ...
278
```
279
280
[Standard Library Integration](./patching.md)
281
282
### Client Management
283
284
Base client functionality for authentication, caching configuration, and cloud service connection management across all supported providers.
285
286
```python { .api }
287
class Client:
288
def __init__(
289
self,
290
file_cache_mode: FileCacheMode = None,
291
local_cache_dir: str = None,
292
content_type_method = None
293
): ...
294
295
@classmethod
296
def get_default_client(cls): ...
297
def set_as_default_client(self) -> None: ...
298
def clear_cache(self) -> None: ...
299
```
300
301
[Client Management](./client-management.md)
302
303
### Configuration and Enums
304
305
Configuration options for cache management, file handling modes, and other library settings that control behavior across all cloud providers.
306
307
```python { .api }
308
class FileCacheMode(str, Enum):
309
persistent = "persistent"
310
tmp_dir = "tmp_dir"
311
cloudpath_object = "cloudpath_object"
312
close_file = "close_file"
313
314
@classmethod
315
def from_environment(cls): ...
316
317
# Implementation registry for cloud provider management
318
implementation_registry: typing.Dict[str, "CloudImplementation"]
319
```
320
321
[Configuration and Enums](./configuration.md)
322
323
### Exception Handling
324
325
Comprehensive exception hierarchy for precise error handling across different cloud providers and operation types, with specific exceptions for common cloud storage scenarios.
326
327
```python { .api }
328
class CloudPathException(Exception): ...
329
class CloudPathFileNotFoundError(CloudPathException, FileNotFoundError): ...
330
class MissingCredentialsError(CloudPathException): ...
331
class InvalidPrefixError(CloudPathException): ...
332
# ... and 15+ more specific exception types
333
```
334
335
[Exception Handling](./exceptions.md)