Tessl Tile for pypi/s3transfer@0.13.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

bandwidth-management.md configuration.md crt-support.md exception-handling.md file-utilities.md futures-coordination.md index.md legacy-transfer.md process-pool-downloads.md subscribers-callbacks.md transfer-manager.md

process-pool-downloads.mddocs/

0
# Process Pool Downloads
1

2
High-performance multiprocessing-based download functionality that bypasses Python's Global Interpreter Lock (GIL) limitations for improved throughput on multi-core systems. This module provides an alternative to the thread-based TransferManager for download-only scenarios requiring maximum performance.
3

4
## Capabilities
5

6
### ProcessPoolDownloader
7

8
The main downloader class that uses multiple processes for concurrent S3 downloads, providing true parallelism and better CPU utilization compared to thread-based approaches.
9

10
```python { .api }
11
class ProcessPoolDownloader:
12
    """
13
    Multiprocessing-based S3 downloader for high-performance downloads.
14
    
15
    Args:
16
        client_kwargs (dict, optional): Arguments for creating S3 clients in each process
17
        config (ProcessTransferConfig, optional): Configuration for download behavior
18
    """
19
    def __init__(self, client_kwargs=None, config=None): ...
20
    
21
    def download_file(self, bucket, key, filename, extra_args=None, expected_size=None):
22
        """
23
        Download an S3 object to a local file using multiple processes.
24
        
25
        Args:
26
            bucket (str): S3 bucket name
27
            key (str): S3 object key/name
28
            filename (str): Local file path to download to
29
            extra_args (dict, optional): Additional S3 operation arguments
30
            expected_size (int, optional): Expected size of the object (avoids HEAD request)
31
        
32
        Returns:
33
            ProcessPoolTransferFuture: Future object for tracking download progress
34
        """
35
    
36
    def shutdown(self):
37
        """
38
        Shutdown the downloader and wait for all downloads to complete.
39
        """
40
    
41
    def __enter__(self):
42
        """Context manager entry."""
43
        
44
    def __exit__(self, exc_type, exc_val, exc_tb):
45
        """Context manager exit with automatic shutdown."""
46
```
47

48
### ProcessTransferConfig
49

50
Configuration class for controlling ProcessPool downloader behavior including multipart thresholds and process concurrency.
51

52
```python { .api }
53
class ProcessTransferConfig:
54
    """
55
    Configuration for ProcessPoolDownloader with multiprocessing-specific options.
56
    
57
    Args:
58
        multipart_threshold (int): Size threshold for ranged downloads (default: 8MB)
59
        multipart_chunksize (int): Size of each download chunk (default: 8MB)
60
        max_request_processes (int): Maximum number of download processes (default: 10)
61
    """
62
    def __init__(
63
        self,
64
        multipart_threshold=8 * 1024 * 1024,
65
        multipart_chunksize=8 * 1024 * 1024,
66
        max_request_processes=10
67
    ): ...
68
    
69
    multipart_threshold: int
70
    multipart_chunksize: int
71
    max_request_processes: int
72
```
73

74
### ProcessPoolTransferFuture
75

76
Future object representing a ProcessPool download operation with methods for monitoring progress and retrieving results.
77

78
```python { .api }
79
class ProcessPoolTransferFuture:
80
    """
81
    Future representing a ProcessPool download request.
82
    """
83
    def done(self) -> bool:
84
        """
85
        Check if the download is complete.
86
        
87
        Returns:
88
            bool: True if download is complete (success or failure), False otherwise
89
        """
90
    
91
    def result(self):
92
        """
93
        Get the download result, blocking until complete.
94
        
95
        Returns:
96
            None: Returns None on successful completion
97
        
98
        Raises:
99
            Exception: Any exception that occurred during download
100
        """
101
    
102
    def cancel(self):
103
        """
104
        Cancel the download if possible.
105
        
106
        Returns:
107
            bool: True if cancellation was successful, False otherwise
108
        """
109
    
110
    @property
111
    def meta(self) -> 'ProcessPoolTransferMeta':
112
        """
113
        Transfer metadata object containing call arguments and status information.
114
        
115
        Returns:
116
            ProcessPoolTransferMeta: Metadata object for this download
117
        """
118
```
119

120
### ProcessPoolTransferMeta
121

122
Metadata container providing information about a ProcessPool download including call arguments and transfer ID.
123

124
```python { .api }
125
class ProcessPoolTransferMeta:
126
    """
127
    Metadata about a ProcessPoolTransferFuture containing call arguments and transfer information.
128
    """
129
    @property
130
    def call_args(self):
131
        """
132
        The original call arguments used for the download.
133
        
134
        Returns:
135
            CallArgs: Object containing method arguments (bucket, key, filename, etc.)
136
        """
137
    
138
    @property
139
    def transfer_id(self):
140
        """
141
        Unique identifier for this transfer.
142
        
143
        Returns:
144
            str: Transfer ID string
145
        """
146
```
147

148
## Usage Examples
149

150
### Basic ProcessPool Download
151

152
```python
153
import boto3
154
from s3transfer.processpool import ProcessPoolDownloader, ProcessTransferConfig
155

156
# Create downloader with custom configuration
157
config = ProcessTransferConfig(
158
    multipart_threshold=16 * 1024 * 1024,  # 16MB
159
    multipart_chunksize=8 * 1024 * 1024,   # 8MB chunks
160
    max_request_processes=15                # 15 concurrent processes
161
)
162

163
downloader = ProcessPoolDownloader(
164
    client_kwargs={'region_name': 'us-west-2'},
165
    config=config
166
)
167

168
try:
169
    # Download a file
170
    future = downloader.download_file('my-bucket', 'large-file.zip', '/tmp/downloaded-file.zip')
171
    
172
    # Wait for completion
173
    future.result()  # Blocks until complete
174
    print("Download completed successfully")
175
    
176
finally:
177
    downloader.shutdown()
178
```
179

180
### Context Manager Usage
181

182
```python
183
from s3transfer.processpool import ProcessPoolDownloader
184

185
# Using context manager for automatic cleanup
186
with ProcessPoolDownloader() as downloader:
187
    future = downloader.download_file('my-bucket', 'data.csv', '/tmp/data.csv')
188
    future.result()
189
    print("Download completed")
190
# Downloader automatically shut down
191
```
192

193
### Multiple Concurrent Downloads
194

195
```python
196
from s3transfer.processpool import ProcessPoolDownloader
197

198
files_to_download = [
199
    ('my-bucket', 'file1.txt', '/tmp/file1.txt'),
200
    ('my-bucket', 'file2.txt', '/tmp/file2.txt'),
201
    ('my-bucket', 'file3.txt', '/tmp/file3.txt'),
202
]
203

204
with ProcessPoolDownloader() as downloader:
205
    futures = []
206
    
207
    # Start all downloads
208
    for bucket, key, filename in files_to_download:
209
        future = downloader.download_file(bucket, key, filename)
210
        futures.append(future)
211
    
212
    # Wait for all to complete
213
    for future in futures:
214
        future.result()
215
    
216
    print("All downloads completed")
217
```
218

219
## Performance Considerations
220

221
### When to Use ProcessPool vs TransferManager
222

223
**Use ProcessPoolDownloader when:**
224
- Downloading many files concurrently
225
- Downloading very large files requiring maximum throughput
226
- CPU resources are available (multi-core systems)
227
- Download-only operations (no uploads or copies needed)
228
- Python GIL is a bottleneck in your application
229

230
**Use TransferManager when:**
231
- Mixed operations (uploads, downloads, copies) are needed
232
- Lower memory overhead is important
233
- Simpler threading model is preferred
234
- Working with smaller files or fewer concurrent operations
235

236
### Memory and Resource Usage
237

238
- ProcessPool uses more memory due to separate process overhead
239
- Each process maintains its own S3 client and connection pool
240
- Better CPU utilization on multi-core systems
241
- Higher throughput for I/O intensive workloads
242
- Process isolation provides better fault tolerance
243

244
### Configuration Tuning
245

246
- **multipart_threshold**: Lower values use more processes, higher throughput but more overhead
247
- **multipart_chunksize**: Smaller chunks provide better parallelism, larger chunks reduce overhead
248
- **max_request_processes**: Should typically match or slightly exceed CPU core count

Version

Tile

Files

process-pool-downloads.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

process-pool-downloads.mddocs/