or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-processing.mddocument-loading.mdindex.mdjson-canonicalization.mdrdf-conversion.mdurl-utilities.md

json-canonicalization.mddocs/

0

# JSON Canonicalization

1

2

RFC 8785 compliant JSON canonicalization for consistent JSON serialization, hashing, and digital signatures. The c14n module provides deterministic JSON string representation.

3

4

## Capabilities

5

6

### JSON Canonicalization

7

8

Produces canonical JSON string representation according to RFC 8785 standards.

9

10

```python { .api }

11

def canonicalize(obj, utf8=True):

12

"""

13

Canonicalizes a JSON object according to RFC 8785.

14

15

Produces a deterministic string representation of JSON data by:

16

- Sorting object keys lexicographically

17

- Using minimal whitespace (no extra spaces)

18

- Consistent number formatting

19

- Proper Unicode escape sequences

20

21

Args:

22

obj: The JSON-serializable object to canonicalize (dict, list, str,

23

int, float, bool, None)

24

utf8 (bool): If True, return bytes encoded as UTF-8; if False,

25

return Unicode string (default: True)

26

27

Returns:

28

bytes or str: Canonical JSON representation (bytes if utf8=True,

29

str if utf8=False)

30

31

Raises:

32

TypeError: If obj contains non-JSON-serializable types

33

ValueError: If obj contains circular references

34

"""

35

```

36

37

#### Examples

38

39

```python

40

from c14n import canonicalize

41

import json

42

43

# Basic canonicalization

44

data = {"name": "Alice", "age": 30, "city": "New York"}

45

canonical_bytes = canonicalize(data)

46

print(canonical_bytes) # b'{"age":30,"city":"New York","name":"Alice"}'

47

48

# Get string instead of bytes

49

canonical_str = canonicalize(data, utf8=False)

50

print(canonical_str) # '{"age":30,"city":"New York","name":"Alice"}'

51

52

# Complex nested structure

53

complex_data = {

54

"users": [

55

{"id": 2, "name": "Bob"},

56

{"id": 1, "name": "Alice"}

57

],

58

"metadata": {

59

"version": "1.0",

60

"created": "2023-01-01"

61

}

62

}

63

64

canonical = canonicalize(complex_data, utf8=False)

65

print(canonical)

66

# Output: {"metadata":{"created":"2023-01-01","version":"1.0"},"users":[{"id":2,"name":"Bob"},{"id":1,"name":"Alice"}]}

67

```

68

69

### JSON Serialization

70

71

Alternative serialization function without key sorting (non-canonical).

72

73

```python { .api }

74

def serialize(obj, utf8=True):

75

"""

76

Serializes JSON object without canonicalization (preserves key order).

77

78

Args:

79

obj: The JSON-serializable object to serialize

80

utf8 (bool): If True, return bytes encoded as UTF-8; if False,

81

return Unicode string (default: True)

82

83

Returns:

84

bytes or str: JSON representation without key reordering

85

86

Raises:

87

TypeError: If obj contains non-JSON-serializable types

88

ValueError: If obj contains circular references

89

"""

90

```

91

92

#### Example

93

94

```python

95

from c14n import serialize

96

97

data = {"name": "Alice", "age": 30, "city": "New York"}

98

99

# Serialize preserving original key order

100

serialized = serialize(data, utf8=False)

101

print(serialized) # '{"name":"Alice","age":30,"city":"New York"}'

102

103

# Compare with canonicalization (keys sorted)

104

canonical = canonicalize(data, utf8=False)

105

print(canonical) # '{"age":30,"city":"New York","name":"Alice"}'

106

```

107

108

## Canonicalization Rules

109

110

### Key Ordering

111

112

Object keys are sorted lexicographically using Unicode code points:

113

114

```python

115

data = {

116

"zebra": 1,

117

"apple": 2,

118

"banana": 3,

119

"Apple": 4 # Capital A comes before lowercase a

120

}

121

122

canonical = canonicalize(data, utf8=False)

123

# Result: {"Apple":4,"apple":2,"banana":3,"zebra":1}

124

```

125

126

### Number Formatting

127

128

Numbers are formatted in their minimal representation:

129

130

```python

131

numbers = {

132

"integer": 42,

133

"float": 3.14159,

134

"zero": 0,

135

"negative": -123,

136

"scientific": 1.23e-4

137

}

138

139

canonical = canonicalize(numbers, utf8=False)

140

# Numbers formatted without unnecessary precision or notation

141

```

142

143

### String Handling

144

145

Strings are properly escaped with minimal escape sequences:

146

147

```python

148

strings = {

149

"quote": 'He said "Hello"',

150

"newline": "Line 1\nLine 2",

151

"unicode": "café",

152

"control": "tab\there"

153

}

154

155

canonical = canonicalize(strings, utf8=False)

156

# Proper JSON string escaping applied

157

```

158

159

### Array Preservation

160

161

Array element order is preserved (not sorted):

162

163

```python

164

data = {

165

"numbers": [3, 1, 4, 1, 5],

166

"mixed": ["zebra", "apple", "banana"]

167

}

168

169

canonical = canonicalize(data, utf8=False)

170

# Array order maintained: {"mixed":["zebra","apple","banana"],"numbers":[3,1,4,1,5]}

171

```

172

173

## Use Cases

174

175

### Digital Signatures

176

177

```python

178

from c14n import canonicalize

179

import hashlib

180

import hmac

181

182

def sign_json(data, secret_key):

183

"""Create digital signature of JSON data."""

184

canonical_bytes = canonicalize(data)

185

signature = hmac.new(secret_key, canonical_bytes, hashlib.sha256).hexdigest()

186

return signature

187

188

def verify_json(data, signature, secret_key):

189

"""Verify digital signature of JSON data."""

190

canonical_bytes = canonicalize(data)

191

expected_signature = hmac.new(secret_key, canonical_bytes, hashlib.sha256).hexdigest()

192

return hmac.compare_digest(signature, expected_signature)

193

194

# Example usage

195

document = {"user": "alice", "action": "login", "timestamp": "2023-01-01T12:00:00Z"}

196

secret = b"my-secret-key"

197

198

signature = sign_json(document, secret)

199

is_valid = verify_json(document, signature, secret)

200

```

201

202

### Content Hashing

203

204

```python

205

import hashlib

206

from c14n import canonicalize

207

208

def hash_json(data):

209

"""Create deterministic hash of JSON data."""

210

canonical_bytes = canonicalize(data)

211

return hashlib.sha256(canonical_bytes).hexdigest()

212

213

# Same data in different orders produces same hash

214

data1 = {"name": "Alice", "age": 30}

215

data2 = {"age": 30, "name": "Alice"}

216

217

hash1 = hash_json(data1)

218

hash2 = hash_json(data2)

219

print(hash1 == hash2) # True - same canonical representation

220

```

221

222

### Data Deduplication

223

224

```python

225

from c14n import canonicalize

226

227

def deduplicate_json(json_objects):

228

"""Remove duplicate JSON objects based on canonical form."""

229

seen = set()

230

unique_objects = []

231

232

for obj in json_objects:

233

canonical = canonicalize(obj)

234

if canonical not in seen:

235

seen.add(canonical)

236

unique_objects.append(obj)

237

238

return unique_objects

239

240

# Example with duplicate data in different order

241

objects = [

242

{"name": "Alice", "age": 30},

243

{"age": 30, "name": "Alice"}, # Duplicate in different order

244

{"name": "Bob", "age": 25}

245

]

246

247

unique = deduplicate_json(objects)

248

# Returns: [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]

249

```

250

251

### JSON-LD Integration

252

253

```python

254

from pyld import jsonld

255

from c14n import canonicalize

256

257

def canonical_json_ld_hash(doc):

258

"""Create hash of JSON-LD document after normalization and canonicalization."""

259

# First normalize with JSON-LD

260

normalized = jsonld.normalize(doc, {

261

'algorithm': 'URDNA2015',

262

'format': 'application/n-quads'

263

})

264

265

# Then canonicalize the normalized form

266

canonical = canonicalize(normalized)

267

return hashlib.sha256(canonical).hexdigest()

268

```

269

270

## RFC 8785 Compliance

271

272

The canonicalization follows RFC 8785 specifications:

273

274

1. **Object Key Ordering**: Keys sorted by Unicode code point values

275

2. **Whitespace**: No unnecessary whitespace (compact representation)

276

3. **String Escaping**: Minimal required escape sequences

277

4. **Number Representation**: Minimal numeric representation

278

5. **Array Ordering**: Preserve original array element order

279

6. **Unicode Handling**: Proper UTF-8 encoding and escape sequences

280

281

## Performance Considerations

282

283

### Memory Usage

284

285

```python

286

# For large objects, canonicalization creates string representation in memory

287

large_data = {"items": list(range(100000))}

288

canonical = canonicalize(large_data) # Creates large string in memory

289

```

290

291

### Caching Canonical Forms

292

293

```python

294

from functools import lru_cache

295

296

@lru_cache(maxsize=1000)

297

def cached_canonicalize(data_str):

298

"""Cache canonical forms for frequently used data."""

299

import json

300

data = json.loads(data_str)

301

return canonicalize(data, utf8=False)

302

303

# Use with JSON string input for caching

304

data_json = '{"name": "Alice", "age": 30}'

305

canonical = cached_canonicalize(data_json)

306

```

307

308

## Error Handling

309

310

Canonicalization functions may raise standard Python JSON errors:

311

312

- **TypeError**: Non-serializable objects (functions, custom classes)

313

- **ValueError**: Circular references in nested structures

314

- **UnicodeEncodeError**: Invalid Unicode characters

315

316

```python

317

from c14n import canonicalize

318

import json

319

320

try:

321

# This will fail - functions aren't JSON serializable

322

invalid_data = {"func": lambda x: x}

323

canonical = canonicalize(invalid_data)

324

except TypeError as e:

325

print(f"Serialization error: {e}")

326

327

try:

328

# This will fail - circular reference

329

circular = {}

330

circular["self"] = circular

331

canonical = canonicalize(circular)

332

except ValueError as e:

333

print(f"Circular reference error: {e}")

334

```

335

336

## Integration with PyLD

337

338

The c14n module is used internally by PyLD for JSON-LD processing:

339

340

```python

341

# PyLD uses canonicalization in normalization algorithms

342

from pyld import jsonld

343

344

doc = {"@context": {...}, "@id": "example:1", "name": "Test"}

345

normalized = jsonld.normalize(doc, {

346

'algorithm': 'URDNA2015',

347

'format': 'application/n-quads'

348

})

349

# Internally uses canonicalization for consistent RDF representation

350

```