0
# PyBase64
1
2
Fast Base64 encoding/decoding library that provides a high-performance wrapper around the optimized libbase64 C library. PyBase64 offers the same API as Python's built-in base64 module for easy integration while delivering significantly faster performance through SIMD optimizations (AVX2, AVX512-VBMI, Neon) and native C implementations.
3
4
## Package Information
5
6
- **Package Name**: pybase64
7
- **Language**: Python
8
- **Installation**: `pip install pybase64`
9
- **Documentation**: https://pybase64.readthedocs.io/en/stable
10
- **License**: BSD-2-Clause
11
- **CLI Tool**: Available as `pybase64` command or `python -m pybase64`
12
13
## Core Imports
14
15
```python
16
import pybase64
17
```
18
19
For specific functions:
20
21
```python
22
from pybase64 import b64encode, b64decode, standard_b64encode, urlsafe_b64decode
23
```
24
25
## Basic Usage
26
27
```python
28
import pybase64
29
30
# Basic encoding/decoding
31
data = b'Hello, World!'
32
encoded = pybase64.b64encode(data)
33
decoded = pybase64.b64decode(encoded)
34
35
print(encoded) # b'SGVsbG8sIFdvcmxkIQ=='
36
print(decoded) # b'Hello, World!'
37
38
# URL-safe encoding
39
url_encoded = pybase64.urlsafe_b64encode(data)
40
url_decoded = pybase64.urlsafe_b64decode(url_encoded)
41
42
# Custom alphabet
43
custom_encoded = pybase64.b64encode(data, altchars=b'_:')
44
custom_decoded = pybase64.b64decode(custom_encoded, altchars=b'_:')
45
46
# Validation for security-critical applications
47
secure_decoded = pybase64.b64decode(encoded, validate=True)
48
49
# Version and performance info
50
print(pybase64.get_version()) # Shows SIMD optimizations in use
51
```
52
53
## Architecture
54
55
PyBase64 provides a dual-implementation architecture for optimal performance:
56
57
- **C Extension** (`_pybase64`): High-performance implementation using libbase64 with SIMD optimizations
58
- **Python Fallback** (`_fallback`): Pure Python implementation using built-in base64 module when C extension unavailable
59
- **Automatic Selection**: Runtime detection automatically chooses best available implementation
60
- **SIMD Detection**: Runtime CPU feature detection enables optimal instruction sets (AVX2, AVX512-VBMI, Neon)
61
62
This design ensures maximum performance when possible while maintaining compatibility across all Python environments including PyPy and free-threaded builds.
63
64
## Capabilities
65
66
### Core Encoding Functions
67
68
Primary Base64 encoding functions with full alphabet customization and optimal performance through C extensions.
69
70
```python { .api }
71
def b64encode(s: Buffer, altchars: str | Buffer | None = None) -> bytes:
72
"""
73
Encode bytes using Base64 alphabet.
74
75
Parameters:
76
- s: bytes-like object to encode
77
- altchars: optional 2-character string/bytes for custom alphabet (replaces '+' and '/')
78
79
Returns:
80
bytes: Base64 encoded data
81
82
Raises:
83
BufferError: if buffer is not C-contiguous
84
TypeError: for invalid input types
85
ValueError: for non-ASCII strings in altchars
86
"""
87
88
def b64encode_as_string(s: Buffer, altchars: str | Buffer | None = None) -> str:
89
"""
90
Encode bytes using Base64 alphabet, return as string.
91
92
Parameters:
93
- s: bytes-like object to encode
94
- altchars: optional 2-character string/bytes for custom alphabet
95
96
Returns:
97
str: Base64 encoded data as ASCII string
98
"""
99
100
def encodebytes(s: Buffer) -> bytes:
101
"""
102
Encode bytes with MIME-style line breaks every 76 characters.
103
104
Parameters:
105
- s: bytes-like object to encode
106
107
Returns:
108
bytes: Base64 encoded data with newlines per RFC 2045 (MIME)
109
"""
110
```
111
112
### Core Decoding Functions
113
114
Base64 decoding functions with validation options and alternative alphabet support for maximum security and flexibility.
115
116
```python { .api }
117
def b64decode(s: str | Buffer, altchars: str | Buffer | None = None, validate: bool = False) -> bytes:
118
"""
119
Decode Base64 encoded data.
120
121
Parameters:
122
- s: string or bytes-like object to decode
123
- altchars: optional 2-character alternative alphabet
124
- validate: if True, strictly validate input (recommended for security)
125
126
Returns:
127
bytes: decoded data
128
129
Raises:
130
binascii.Error: for invalid padding or characters (when validate=True)
131
"""
132
133
def b64decode_as_bytearray(s: str | Buffer, altchars: str | Buffer | None = None, validate: bool = False) -> bytearray:
134
"""
135
Decode Base64 encoded data, return as bytearray.
136
137
Parameters:
138
- s: string or bytes-like object to decode
139
- altchars: optional 2-character alternative alphabet
140
- validate: if True, strictly validate input
141
142
Returns:
143
bytearray: decoded data as mutable bytearray
144
145
Raises:
146
binascii.Error: for invalid padding or characters (when validate=True)
147
"""
148
```
149
150
### Standard Base64 Functions
151
152
Convenience functions for standard Base64 alphabet encoding/decoding, compatible with Python's base64 module.
153
154
```python { .api }
155
def standard_b64encode(s: Buffer) -> bytes:
156
"""
157
Encode using standard Base64 alphabet (+/).
158
159
Parameters:
160
- s: bytes-like object to encode
161
162
Returns:
163
bytes: standard Base64 encoded data
164
"""
165
166
def standard_b64decode(s: str | Buffer) -> bytes:
167
"""
168
Decode standard Base64 encoded data.
169
170
Parameters:
171
- s: string or bytes-like object to decode
172
173
Returns:
174
bytes: decoded data
175
176
Raises:
177
binascii.Error: for invalid input
178
"""
179
```
180
181
### URL-Safe Base64 Functions
182
183
URL and filesystem safe Base64 encoding/decoding using modified alphabet (-_ instead of +/) for web applications and file names.
184
185
```python { .api }
186
def urlsafe_b64encode(s: Buffer) -> bytes:
187
"""
188
Encode using URL-safe Base64 alphabet (-_).
189
190
Parameters:
191
- s: bytes-like object to encode
192
193
Returns:
194
bytes: URL-safe Base64 encoded data
195
"""
196
197
def urlsafe_b64decode(s: str | Buffer) -> bytes:
198
"""
199
Decode URL-safe Base64 encoded data.
200
201
Parameters:
202
- s: string or bytes-like object to decode
203
204
Returns:
205
bytes: decoded data
206
207
Raises:
208
binascii.Error: for invalid input
209
"""
210
```
211
212
### Utility Functions
213
214
Version and license information functions for runtime introspection and compliance reporting.
215
216
```python { .api }
217
def get_version() -> str:
218
"""
219
Get pybase64 version with optimization status.
220
221
Returns:
222
str: version string with C extension and SIMD status
223
e.g., "1.4.2 (C extension active - AVX2)"
224
"""
225
226
def get_license_text() -> str:
227
"""
228
Get complete license information.
229
230
Returns:
231
str: license text including libbase64 license information
232
"""
233
```
234
235
### SIMD Detection Functions
236
237
Internal functions for SIMD optimization control and introspection (available when C extension is active).
238
239
```python { .api }
240
def _get_simd_flags_compile() -> int:
241
"""
242
Get compile-time SIMD flags used when building the C extension.
243
244
Returns:
245
int: bitmask of SIMD instruction sets available at compile time
246
"""
247
248
def _get_simd_flags_runtime() -> int:
249
"""
250
Get runtime SIMD flags detected on current CPU.
251
252
Returns:
253
int: bitmask of SIMD instruction sets available at runtime
254
"""
255
256
def _get_simd_name(flags: int) -> str:
257
"""
258
Get human-readable name for SIMD instruction set.
259
260
Parameters:
261
- flags: SIMD flags bitmask
262
263
Returns:
264
str: SIMD instruction set name (e.g., "AVX2", "fallback")
265
"""
266
267
def _get_simd_path() -> int:
268
"""
269
Get currently active SIMD path flags.
270
271
Returns:
272
int: active SIMD flags for current execution path
273
"""
274
275
def _set_simd_path(flags: int) -> None:
276
"""
277
Set SIMD path for optimization (advanced users only).
278
279
Parameters:
280
- flags: SIMD flags to activate
281
282
Note: Only available when C extension is active
283
"""
284
```
285
286
### Command-Line Interface
287
288
PyBase64 provides a comprehensive command-line tool for encoding, decoding, and benchmarking Base64 operations.
289
290
```bash { .api }
291
# Main command with version and help
292
pybase64 --version
293
pybase64 --license
294
pybase64 -h
295
296
# Encoding subcommand
297
pybase64 encode <input_file> [-o <output_file>] [-u|--url] [-a <altchars>]
298
299
# Decoding subcommand
300
pybase64 decode <input_file> [-o <output_file>] [-u|--url] [-a <altchars>] [--no-validation]
301
302
# Benchmarking subcommand
303
pybase64 benchmark <input_file> [-d <duration>]
304
```
305
306
The CLI can also be invoked using Python module syntax:
307
308
```bash { .api }
309
python -m pybase64 <subcommand> [arguments...]
310
```
311
312
### Module Attributes
313
314
Package version and exported symbols for version checking and introspection.
315
316
```python { .api }
317
__version__: str # Package version string
318
__all__: tuple[str, ...] # Exported public API symbols
319
```
320
321
## Type Definitions
322
323
```python { .api }
324
# Type alias for bytes-like objects (version-dependent import)
325
if sys.version_info < (3, 12):
326
from typing_extensions import Buffer
327
else:
328
from collections.abc import Buffer
329
330
# Protocol for decode functions
331
class Decode(Protocol):
332
__name__: str
333
__module__: str
334
def __call__(self, s: str | Buffer, altchars: str | Buffer | None = None, validate: bool = False) -> bytes: ...
335
336
# Protocol for encode functions
337
class Encode(Protocol):
338
__name__: str
339
__module__: str
340
def __call__(self, s: Buffer, altchars: Buffer | None = None) -> bytes: ...
341
342
# Protocol for encodebytes-style functions
343
class EncodeBytes(Protocol):
344
__name__: str
345
__module__: str
346
def __call__(self, s: Buffer) -> bytes: ...
347
```
348
349
## Usage Examples
350
351
### Performance-Optimized Decoding
352
353
```python
354
import pybase64
355
356
# For maximum security and performance, use validate=True
357
# This enables optimized validation in the C extension
358
data = b'SGVsbG8sIFdvcmxkIQ=='
359
decoded = pybase64.b64decode(data, validate=True)
360
```
361
362
### Custom Alphabet Usage
363
364
```python
365
import pybase64
366
367
# Create data with custom alphabet for specific protocols
368
data = b'binary data here'
369
encoded = pybase64.b64encode(data, altchars=b'@&')
370
# Result uses @ and & instead of + and /
371
372
# Decode with same custom alphabet
373
decoded = pybase64.b64decode(encoded, altchars=b'@&')
374
```
375
376
### MIME-Compatible Encoding
377
378
```python
379
import pybase64
380
381
# Encode with line breaks for email/MIME compatibility
382
large_data = b'x' * 200 # Large binary data
383
mime_encoded = pybase64.encodebytes(large_data)
384
# Result has newlines every 76 characters per RFC 2045
385
```
386
387
### Runtime Performance Information
388
389
```python
390
import pybase64
391
392
# Check if C extension and SIMD optimizations are active
393
version_info = pybase64.get_version()
394
print(version_info)
395
# Output examples:
396
# "1.4.2 (C extension active - AVX2)"
397
# "1.4.2 (C extension inactive)" # Fallback mode
398
```
399
400
### Command-Line Usage Examples
401
402
```bash
403
# Encode a file using standard Base64
404
pybase64 encode input.txt -o encoded.txt
405
406
# Decode with validation (recommended for security)
407
pybase64 decode encoded.txt -o decoded.txt
408
409
# URL-safe encoding for web applications
410
pybase64 encode data.bin -u -o urlsafe.txt
411
412
# Custom alphabet encoding
413
pybase64 encode data.bin -a '@&' -o custom.txt
414
415
# Benchmark performance on your system
416
pybase64 benchmark test_data.bin
417
418
# Pipe operations (using stdin/stdout)
419
echo "Hello World" | pybase64 encode -
420
cat encoded.txt | pybase64 decode - > decoded.txt
421
422
# Check version and license
423
pybase64 --version
424
pybase64 --license
425
426
# Using Python module syntax
427
python -m pybase64 encode input.txt
428
```
429
430
## Error Handling
431
432
All decoding functions may raise `binascii.Error` for:
433
- Incorrect Base64 padding
434
- Invalid characters in input (when `validate=True`)
435
- Malformed Base64 strings
436
437
Encoding functions may raise:
438
- `BufferError` for non-contiguous memory buffers
439
- `TypeError` for invalid input types
440
- `ValueError` for non-ASCII characters in custom alphabets
441
442
## Performance Notes
443
444
- Use `validate=True` for security-critical applications - it's optimized in the C extension
445
- C extension provides 5-20x performance improvement over Python's built-in base64
446
- SIMD optimizations (AVX2, AVX512-VBMI, Neon) are automatically detected and used when available
447
- For maximum performance, use `b64decode` and `b64encode` directly rather than wrapper functions
448
- PyPy and free-threaded Python builds are fully supported with automatic fallback