Tessl Tile for pypi/pyld@2.0.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

core-processing.md document-loading.md index.md json-canonicalization.md rdf-conversion.md url-utilities.md

json-canonicalization.mddocs/

0
# JSON Canonicalization
1

2
RFC 8785 compliant JSON canonicalization for consistent JSON serialization, hashing, and digital signatures. The c14n module provides deterministic JSON string representation.
3

4
## Capabilities
5

6
### JSON Canonicalization
7

8
Produces canonical JSON string representation according to RFC 8785 standards.
9

10
```python { .api }
11
def canonicalize(obj, utf8=True):
12
    """
13
    Canonicalizes a JSON object according to RFC 8785.
14
    
15
    Produces a deterministic string representation of JSON data by:
16
    - Sorting object keys lexicographically
17
    - Using minimal whitespace (no extra spaces)
18
    - Consistent number formatting
19
    - Proper Unicode escape sequences
20
    
21
    Args:
22
        obj: The JSON-serializable object to canonicalize (dict, list, str, 
23
             int, float, bool, None)
24
        utf8 (bool): If True, return bytes encoded as UTF-8; if False, 
25
                    return Unicode string (default: True)
26
    
27
    Returns:
28
        bytes or str: Canonical JSON representation (bytes if utf8=True, 
29
                     str if utf8=False)
30
        
31
    Raises:
32
        TypeError: If obj contains non-JSON-serializable types
33
        ValueError: If obj contains circular references
34
    """
35
```
36

37
#### Examples
38

39
```python
40
from c14n import canonicalize
41
import json
42

43
# Basic canonicalization
44
data = {"name": "Alice", "age": 30, "city": "New York"}
45
canonical_bytes = canonicalize(data)
46
print(canonical_bytes)  # b'{"age":30,"city":"New York","name":"Alice"}'
47

48
# Get string instead of bytes
49
canonical_str = canonicalize(data, utf8=False)
50
print(canonical_str)    # '{"age":30,"city":"New York","name":"Alice"}'
51

52
# Complex nested structure
53
complex_data = {
54
    "users": [
55
        {"id": 2, "name": "Bob"},
56
        {"id": 1, "name": "Alice"}
57
    ],
58
    "metadata": {
59
        "version": "1.0",
60
        "created": "2023-01-01"
61
    }
62
}
63

64
canonical = canonicalize(complex_data, utf8=False)
65
print(canonical)
66
# Output: {"metadata":{"created":"2023-01-01","version":"1.0"},"users":[{"id":2,"name":"Bob"},{"id":1,"name":"Alice"}]}
67
```
68

69
### JSON Serialization
70

71
Alternative serialization function without key sorting (non-canonical).
72

73
```python { .api }
74
def serialize(obj, utf8=True):
75
    """
76
    Serializes JSON object without canonicalization (preserves key order).
77
    
78
    Args:
79
        obj: The JSON-serializable object to serialize
80
        utf8 (bool): If True, return bytes encoded as UTF-8; if False,
81
                    return Unicode string (default: True)
82
    
83
    Returns:
84
        bytes or str: JSON representation without key reordering
85
        
86
    Raises:
87
        TypeError: If obj contains non-JSON-serializable types
88
        ValueError: If obj contains circular references
89
    """
90
```
91

92
#### Example
93

94
```python
95
from c14n import serialize
96

97
data = {"name": "Alice", "age": 30, "city": "New York"}
98

99
# Serialize preserving original key order
100
serialized = serialize(data, utf8=False)
101
print(serialized)  # '{"name":"Alice","age":30,"city":"New York"}'
102

103
# Compare with canonicalization (keys sorted)
104
canonical = canonicalize(data, utf8=False)
105
print(canonical)   # '{"age":30,"city":"New York","name":"Alice"}'
106
```
107

108
## Canonicalization Rules
109

110
### Key Ordering
111

112
Object keys are sorted lexicographically using Unicode code points:
113

114
```python
115
data = {
116
    "zebra": 1,
117
    "apple": 2, 
118
    "banana": 3,
119
    "Apple": 4  # Capital A comes before lowercase a
120
}
121

122
canonical = canonicalize(data, utf8=False)
123
# Result: {"Apple":4,"apple":2,"banana":3,"zebra":1}
124
```
125

126
### Number Formatting
127

128
Numbers are formatted in their minimal representation:
129

130
```python
131
numbers = {
132
    "integer": 42,
133
    "float": 3.14159,
134
    "zero": 0,
135
    "negative": -123,
136
    "scientific": 1.23e-4
137
}
138

139
canonical = canonicalize(numbers, utf8=False)
140
# Numbers formatted without unnecessary precision or notation
141
```
142

143
### String Handling
144

145
Strings are properly escaped with minimal escape sequences:
146

147
```python
148
strings = {
149
    "quote": 'He said "Hello"',
150
    "newline": "Line 1\nLine 2",
151
    "unicode": "café",
152
    "control": "tab\there"
153
}
154

155
canonical = canonicalize(strings, utf8=False)
156
# Proper JSON string escaping applied
157
```
158

159
### Array Preservation
160

161
Array element order is preserved (not sorted):
162

163
```python
164
data = {
165
    "numbers": [3, 1, 4, 1, 5],
166
    "mixed": ["zebra", "apple", "banana"]
167
}
168

169
canonical = canonicalize(data, utf8=False)
170
# Array order maintained: {"mixed":["zebra","apple","banana"],"numbers":[3,1,4,1,5]}
171
```
172

173
## Use Cases
174

175
### Digital Signatures
176

177
```python
178
from c14n import canonicalize
179
import hashlib
180
import hmac
181

182
def sign_json(data, secret_key):
183
    """Create digital signature of JSON data."""
184
    canonical_bytes = canonicalize(data)
185
    signature = hmac.new(secret_key, canonical_bytes, hashlib.sha256).hexdigest()
186
    return signature
187

188
def verify_json(data, signature, secret_key):
189
    """Verify digital signature of JSON data."""
190
    canonical_bytes = canonicalize(data)
191
    expected_signature = hmac.new(secret_key, canonical_bytes, hashlib.sha256).hexdigest()
192
    return hmac.compare_digest(signature, expected_signature)
193

194
# Example usage
195
document = {"user": "alice", "action": "login", "timestamp": "2023-01-01T12:00:00Z"}
196
secret = b"my-secret-key"
197

198
signature = sign_json(document, secret)
199
is_valid = verify_json(document, signature, secret)
200
```
201

202
### Content Hashing
203

204
```python
205
import hashlib
206
from c14n import canonicalize
207

208
def hash_json(data):
209
    """Create deterministic hash of JSON data."""
210
    canonical_bytes = canonicalize(data)
211
    return hashlib.sha256(canonical_bytes).hexdigest()
212

213
# Same data in different orders produces same hash
214
data1 = {"name": "Alice", "age": 30}
215
data2 = {"age": 30, "name": "Alice"}
216

217
hash1 = hash_json(data1)
218
hash2 = hash_json(data2)
219
print(hash1 == hash2)  # True - same canonical representation
220
```
221

222
### Data Deduplication
223

224
```python
225
from c14n import canonicalize
226

227
def deduplicate_json(json_objects):
228
    """Remove duplicate JSON objects based on canonical form."""
229
    seen = set()
230
    unique_objects = []
231
    
232
    for obj in json_objects:
233
        canonical = canonicalize(obj)
234
        if canonical not in seen:
235
            seen.add(canonical)
236
            unique_objects.append(obj)
237
    
238
    return unique_objects
239

240
# Example with duplicate data in different order
241
objects = [
242
    {"name": "Alice", "age": 30},
243
    {"age": 30, "name": "Alice"},  # Duplicate in different order
244
    {"name": "Bob", "age": 25}
245
]
246

247
unique = deduplicate_json(objects)
248
# Returns: [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]
249
```
250

251
### JSON-LD Integration
252

253
```python
254
from pyld import jsonld
255
from c14n import canonicalize
256

257
def canonical_json_ld_hash(doc):
258
    """Create hash of JSON-LD document after normalization and canonicalization."""
259
    # First normalize with JSON-LD
260
    normalized = jsonld.normalize(doc, {
261
        'algorithm': 'URDNA2015',
262
        'format': 'application/n-quads'
263
    })
264
    
265
    # Then canonicalize the normalized form
266
    canonical = canonicalize(normalized)
267
    return hashlib.sha256(canonical).hexdigest()
268
```
269

270
## RFC 8785 Compliance
271

272
The canonicalization follows RFC 8785 specifications:
273

274
1. **Object Key Ordering**: Keys sorted by Unicode code point values
275
2. **Whitespace**: No unnecessary whitespace (compact representation)  
276
3. **String Escaping**: Minimal required escape sequences
277
4. **Number Representation**: Minimal numeric representation
278
5. **Array Ordering**: Preserve original array element order
279
6. **Unicode Handling**: Proper UTF-8 encoding and escape sequences
280

281
## Performance Considerations
282

283
### Memory Usage
284

285
```python
286
# For large objects, canonicalization creates string representation in memory
287
large_data = {"items": list(range(100000))}
288
canonical = canonicalize(large_data)  # Creates large string in memory
289
```
290

291
### Caching Canonical Forms
292

293
```python
294
from functools import lru_cache
295

296
@lru_cache(maxsize=1000)
297
def cached_canonicalize(data_str):
298
    """Cache canonical forms for frequently used data."""
299
    import json
300
    data = json.loads(data_str)
301
    return canonicalize(data, utf8=False)
302

303
# Use with JSON string input for caching
304
data_json = '{"name": "Alice", "age": 30}'
305
canonical = cached_canonicalize(data_json)
306
```
307

308
## Error Handling
309

310
Canonicalization functions may raise standard Python JSON errors:
311

312
- **TypeError**: Non-serializable objects (functions, custom classes)
313
- **ValueError**: Circular references in nested structures
314
- **UnicodeEncodeError**: Invalid Unicode characters
315

316
```python
317
from c14n import canonicalize
318
import json
319

320
try:
321
    # This will fail - functions aren't JSON serializable
322
    invalid_data = {"func": lambda x: x}
323
    canonical = canonicalize(invalid_data)
324
except TypeError as e:
325
    print(f"Serialization error: {e}")
326

327
try:
328
    # This will fail - circular reference
329
    circular = {}
330
    circular["self"] = circular
331
    canonical = canonicalize(circular)
332
except ValueError as e:
333
    print(f"Circular reference error: {e}")
334
```
335

336
## Integration with PyLD
337

338
The c14n module is used internally by PyLD for JSON-LD processing:
339

340
```python
341
# PyLD uses canonicalization in normalization algorithms
342
from pyld import jsonld
343

344
doc = {"@context": {...}, "@id": "example:1", "name": "Test"}
345
normalized = jsonld.normalize(doc, {
346
    'algorithm': 'URDNA2015',
347
    'format': 'application/n-quads'
348
})
349
# Internally uses canonicalization for consistent RDF representation
350
```

Version

Tile

Files

json-canonicalization.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

json-canonicalization.mddocs/