0
# Bencoding
1
2
Complete bencoding implementation for encoding and decoding BitTorrent data structures. Bencode is the encoding format used by the BitTorrent protocol for storing and transmitting structured data, supporting strings, integers, lists, and dictionaries.
3
4
## Capabilities
5
6
### Encoding to Bencode
7
8
Convert Python objects to bencoded byte strings.
9
10
```python { .api }
11
@classmethod
12
def encode(cls, value: TypeEncodable) -> bytes:
13
"""
14
Encode Python object to bencoded bytes.
15
16
Supports encoding of strings, integers, lists, tuples, sets, dictionaries,
17
bytes, and bytearrays. Dictionaries are automatically sorted by key as
18
required by bencode specification.
19
20
Parameters:
21
- value (TypeEncodable): Python object to encode
22
Union[str, int, list, set, tuple, dict, bytes, bytearray]
23
24
Returns:
25
bytes: Bencoded data ready for storage or transmission
26
27
Raises:
28
BencodeEncodingError: If value type cannot be encoded
29
"""
30
```
31
32
### Decoding from Bencode
33
34
Parse bencoded data back into Python objects.
35
36
```python { .api }
37
@classmethod
38
def decode(cls, encoded: bytes, *, byte_keys: Set[str] = None) -> TypeEncodable:
39
"""
40
Decode bencoded bytes to Python objects.
41
42
Automatically reconstructs the original data structure from bencoded format.
43
Handles special case of binary data that should remain as bytes rather than
44
being decoded as UTF-8 strings.
45
46
Parameters:
47
- encoded (bytes): Bencoded data to decode
48
- byte_keys (Set[str], optional): Keys whose values should remain as bytes
49
instead of being decoded as UTF-8 strings.
50
Commonly used for 'pieces' field in torrents.
51
52
Returns:
53
TypeEncodable: Decoded Python object (dict, list, str, int, or bytes)
54
55
Raises:
56
BencodeDecodingError: If data is malformed or cannot be parsed
57
"""
58
```
59
60
### String Decoding
61
62
Decode bencoded strings directly.
63
64
```python { .api }
65
@classmethod
66
def read_string(cls, string: Union[str, bytes], *, byte_keys: Set[str] = None) -> TypeEncodable:
67
"""
68
Decode bencoded string or byte string.
69
70
Convenience method for decoding bencoded data provided as string.
71
Automatically converts string to bytes before decoding.
72
73
Parameters:
74
- string (Union[str, bytes]): Bencoded data as string or bytes
75
- byte_keys (Set[str], optional): Keys to keep as bytes rather than decode as UTF-8
76
77
Returns:
78
TypeEncodable: Decoded Python object
79
80
Raises:
81
BencodeDecodingError: If string is malformed
82
"""
83
```
84
85
### File Decoding
86
87
Decode bencoded files directly from disk.
88
89
```python { .api }
90
@classmethod
91
def read_file(cls, filepath: Union[str, Path], *, byte_keys: Set[str] = None) -> TypeEncodable:
92
"""
93
Decode bencoded data from file.
94
95
Reads entire file into memory and decodes the bencoded content.
96
Commonly used for reading .torrent files.
97
98
Parameters:
99
- filepath (Union[str, Path]): Path to file containing bencoded data
100
- byte_keys (Set[str], optional): Keys to preserve as bytes
101
102
Returns:
103
TypeEncodable: Decoded file contents as Python objects
104
105
Raises:
106
BencodeDecodingError: If file contains malformed bencoded data
107
FileNotFoundError: If file does not exist
108
"""
109
```
110
111
## Usage Examples
112
113
### Basic Encoding and Decoding
114
115
```python
116
from torrentool.bencode import Bencode
117
118
# Encode various Python objects
119
data = {
120
'announce': 'http://tracker.example.com/announce',
121
'info': {
122
'name': 'example.txt',
123
'length': 12345,
124
'pieces': b'\x01\x02\x03\x04\x05' # Binary hash data
125
},
126
'trackers': ['http://t1.com', 'http://t2.com'],
127
'created': 1234567890
128
}
129
130
# Encode to bencoded bytes
131
encoded = Bencode.encode(data)
132
print(f"Encoded size: {len(encoded)} bytes")
133
134
# Decode back to Python objects
135
# Specify 'pieces' as byte key to prevent UTF-8 decoding
136
decoded = Bencode.decode(encoded, byte_keys={'pieces'})
137
print(f"Decoded: {decoded}")
138
139
# Verify round-trip
140
assert decoded == data
141
```
142
143
### Working with Torrent Files
144
145
```python
146
from torrentool.bencode import Bencode
147
from pathlib import Path
148
149
# Read a .torrent file
150
torrent_path = Path('example.torrent')
151
torrent_data = Bencode.read_file(torrent_path, byte_keys={'pieces'})
152
153
print(f"Torrent name: {torrent_data['info']['name']}")
154
print(f"Announce URL: {torrent_data['announce']}")
155
print(f"Piece length: {torrent_data['info']['piece length']}")
156
157
# Modify and save back
158
torrent_data['comment'] = 'Modified by Python script'
159
encoded_data = Bencode.encode(torrent_data)
160
161
with open('modified.torrent', 'wb') as f:
162
f.write(encoded_data)
163
```
164
165
### String and Bytes Handling
166
167
```python
168
from torrentool.bencode import Bencode
169
170
# Working with strings vs bytes
171
string_data = "Hello, world!"
172
bytes_data = b"Binary data \x00\x01\x02"
173
174
# Both can be encoded
175
encoded_string = Bencode.encode(string_data)
176
encoded_bytes = Bencode.encode(bytes_data)
177
178
# Decode - strings come back as strings, bytes as bytes
179
decoded_string = Bencode.decode(encoded_string) # Returns str
180
decoded_bytes = Bencode.decode(encoded_bytes) # Returns bytes
181
182
print(f"String: {decoded_string}")
183
print(f"Bytes: {decoded_bytes}")
184
185
# Complex structure with mixed types
186
mixed_data = {
187
'text': 'This is text',
188
'binary': b'\x89PNG\r\n\x1a\n', # PNG header
189
'number': 42,
190
'list': ['item1', 'item2', b'binary_item']
191
}
192
193
encoded_mixed = Bencode.encode(mixed_data)
194
decoded_mixed = Bencode.decode(encoded_mixed)
195
```
196
197
### Error Handling
198
199
```python
200
from torrentool.bencode import Bencode, BencodeDecodingError, BencodeEncodingError
201
202
# Handle encoding errors
203
try:
204
invalid_data = object() # Objects cannot be encoded
205
Bencode.encode(invalid_data)
206
except BencodeEncodingError as e:
207
print(f"Encoding failed: {e}")
208
209
# Handle decoding errors
210
try:
211
malformed_data = b"invalid bencode data"
212
Bencode.decode(malformed_data)
213
except BencodeDecodingError as e:
214
print(f"Decoding failed: {e}")
215
216
# Graceful handling of corrupted files
217
try:
218
corrupted_torrent = Bencode.read_file('corrupted.torrent')
219
except BencodeDecodingError:
220
print("Torrent file is corrupted or not a valid torrent")
221
except FileNotFoundError:
222
print("Torrent file not found")
223
```
224
225
## Bencode Format Details
226
227
The bencode format uses the following encoding rules:
228
229
- **Strings**: `<length>:<content>` (e.g., `4:spam` for "spam")
230
- **Integers**: `i<number>e` (e.g., `i42e` for 42)
231
- **Lists**: `l<contents>e` (e.g., `l4:spam4:eggse` for ['spam', 'eggs'])
232
- **Dictionaries**: `d<contents>e` with keys sorted (e.g., `d3:key5:valuee`)
233
234
The implementation handles all edge cases including:
235
- Empty strings and containers
236
- Negative integers
237
- Binary data mixed with text
238
- Nested structures of arbitrary depth
239
- UTF-8 encoding/decoding with fallback for malformed data