or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

document-management.mdidentifiers.mdindex.mdprov-elements.mdrelationships.mdserialization.mdvisualization.md

serialization.mddocs/

0

# Serialization and Formats

1

2

Comprehensive serialization support for multiple PROV formats including PROV-JSON, PROV-XML, PROV-O (RDF), and PROV-N, with automatic format detection and pluggable serializer architecture.

3

4

## Capabilities

5

6

### Serializer Base Class

7

8

Abstract base class for all format-specific serializers.

9

10

```python { .api }

11

class Serializer:

12

def __init__(self, document=None):

13

"""

14

Create a serializer for PROV documents.

15

16

Args:

17

document (ProvDocument, optional): Document to serialize

18

"""

19

20

def serialize(self, stream, **args):

21

"""

22

Abstract method for serializing a document.

23

24

Args:

25

stream (file-like): Stream object to serialize the document into

26

**args: Format-specific serialization arguments

27

"""

28

29

def deserialize(self, stream, **args):

30

"""

31

Abstract method for deserializing a document.

32

33

Args:

34

stream (file-like): Stream object to deserialize the document from

35

**args: Format-specific deserialization arguments

36

37

Returns:

38

ProvDocument: Deserialized document

39

"""

40

```

41

42

### Serializer Registry

43

44

Registry system for managing available serializers.

45

46

```python { .api }

47

class Registry:

48

serializers: dict[str, type[Serializer]] = None

49

"""Dictionary mapping format names to serializer classes."""

50

51

@staticmethod

52

def load_serializers():

53

"""

54

Load all available serializers into the registry.

55

56

Registers serializers for:

57

- 'json': PROV-JSON format

58

- 'xml': PROV-XML format

59

- 'rdf': PROV-O (RDF) format

60

- 'provn': PROV-N format

61

"""

62

63

def get(format_name: str) -> type[Serializer]:

64

"""

65

Get the serializer class for the specified format.

66

67

Args:

68

format_name (str): Format name ('json', 'xml', 'rdf', 'provn')

69

70

Returns:

71

type[Serializer]: Serializer class for the format

72

73

Raises:

74

DoNotExist: If no serializer is available for the format

75

"""

76

77

class DoNotExist(Exception):

78

"""Exception raised when a serializer is not available for a format."""

79

```

80

81

### Document Serialization Methods

82

83

ProvDocument provides high-level serialization methods.

84

85

```python { .api }

86

class ProvDocument:

87

def serialize(self, destination=None, format='json', **args):

88

"""

89

Serialize this document to various formats.

90

91

Args:

92

destination (str or file-like, optional): Output destination

93

format (str): Output format ('json', 'xml', 'rdf', 'provn')

94

**args: Format-specific arguments

95

96

Returns:

97

str: Serialized content if no destination specified

98

"""

99

100

@staticmethod

101

def deserialize(source, format=None, **args):

102

"""

103

Deserialize a document from various formats.

104

105

Args:

106

source (str or file-like): Input source

107

format (str, optional): Input format, auto-detected if None

108

**args: Format-specific arguments

109

110

Returns:

111

ProvDocument: Deserialized document

112

"""

113

```

114

115

### Format-Specific Serializers

116

117

Individual serializer classes for each supported format.

118

119

```python { .api }

120

class ProvJSONSerializer(Serializer):

121

"""

122

Serializer for PROV-JSON format.

123

124

PROV-JSON represents provenance as JSON objects with arrays for

125

different record types and attributes.

126

"""

127

128

class ProvXMLSerializer(Serializer):

129

"""

130

Serializer for PROV-XML format.

131

132

PROV-XML represents provenance as XML documents following the

133

W3C PROV-XML schema.

134

135

Requirements:

136

lxml>=3.3.5 (install with: pip install prov[xml])

137

"""

138

139

class ProvRDFSerializer(Serializer):

140

"""

141

Serializer for PROV-O (RDF) format.

142

143

PROV-O represents provenance as RDF triples using the W3C PROV

144

ontology vocabulary.

145

146

Requirements:

147

rdflib>=4.2.1,<7 (install with: pip install prov[rdf])

148

"""

149

150

class ProvNSerializer(Serializer):

151

"""

152

Serializer for PROV-N format.

153

154

PROV-N is the human-readable textual notation for PROV defined

155

by the W3C specification.

156

"""

157

```

158

159

### Convenience Functions

160

161

High-level functions for easy serialization/deserialization.

162

163

```python { .api }

164

def read(source, format=None):

165

"""

166

Convenience function for reading PROV documents with automatic format detection.

167

168

Args:

169

source (str or PathLike or file-like): Source to read from

170

format (str, optional): Format hint for parsing

171

172

Returns:

173

ProvDocument: Loaded document or None

174

175

Raises:

176

TypeError: If format cannot be detected and parsing fails

177

"""

178

```

179

180

## Supported Formats

181

182

### PROV-JSON

183

184

JSON representation of PROV documents with structured objects for each record type.

185

186

```python

187

# Serialize to PROV-JSON

188

doc.serialize('output.json', format='json')

189

doc.serialize('output.json', format='json', indent=2) # Pretty-printed

190

191

# Deserialize from PROV-JSON

192

doc = ProvDocument.deserialize('input.json', format='json')

193

```

194

195

### PROV-XML

196

197

XML representation following the W3C PROV-XML schema.

198

199

```python

200

# Serialize to PROV-XML (requires lxml)

201

doc.serialize('output.xml', format='xml')

202

203

# Deserialize from PROV-XML

204

doc = ProvDocument.deserialize('input.xml', format='xml')

205

```

206

207

### PROV-O (RDF)

208

209

RDF representation using the W3C PROV ontology.

210

211

```python

212

# Serialize to RDF (requires rdflib)

213

doc.serialize('output.rdf', format='rdf')

214

doc.serialize('output.ttl', format='rdf', rdf_format='turtle')

215

doc.serialize('output.n3', format='rdf', rdf_format='n3')

216

217

# Deserialize from RDF

218

doc = ProvDocument.deserialize('input.rdf', format='rdf')

219

doc = ProvDocument.deserialize('input.ttl', format='rdf', rdf_format='turtle')

220

```

221

222

### PROV-N

223

224

Human-readable textual notation defined by W3C.

225

226

```python

227

# Serialize to PROV-N

228

doc.serialize('output.provn', format='provn')

229

230

# Get PROV-N as string

231

provn_string = doc.get_provn()

232

233

# Deserialize from PROV-N

234

doc = ProvDocument.deserialize('input.provn', format='provn')

235

```

236

237

## Usage Examples

238

239

### Basic Serialization

240

241

```python

242

from prov.model import ProvDocument

243

import prov

244

245

# Create a document with some content

246

doc = ProvDocument()

247

doc.add_namespace('ex', 'http://example.org/')

248

249

entity = doc.entity('ex:entity1', {'prov:label': 'Example Entity'})

250

activity = doc.activity('ex:activity1')

251

doc.generation(entity, activity)

252

253

# Serialize to different formats

254

doc.serialize('output.json', format='json')

255

doc.serialize('output.xml', format='xml')

256

doc.serialize('output.rdf', format='rdf')

257

doc.serialize('output.provn', format='provn')

258

259

# Serialize to string

260

json_string = doc.serialize(format='json')

261

xml_string = doc.serialize(format='xml')

262

```

263

264

### Reading Documents

265

266

```python

267

# Read with automatic format detection

268

doc1 = prov.read('document.json') # Auto-detects JSON

269

doc2 = prov.read('document.xml') # Auto-detects XML

270

doc3 = prov.read('document.rdf') # Auto-detects RDF

271

272

# Read with explicit format

273

doc4 = prov.read('document.txt', format='provn')

274

275

# Read from file-like objects

276

with open('document.json', 'r') as f:

277

doc5 = prov.read(f, format='json')

278

```

279

280

### Advanced Serialization Options

281

282

```python

283

# PROV-JSON with pretty printing

284

doc.serialize('pretty.json', format='json', indent=4)

285

286

# RDF with specific format

287

doc.serialize('output.ttl', format='rdf', rdf_format='turtle')

288

doc.serialize('output.nt', format='rdf', rdf_format='nt')

289

290

# Using serializer classes directly

291

from prov.serializers import get

292

293

json_serializer = get('json')(doc)

294

with open('output.json', 'w') as f:

295

json_serializer.serialize(f, indent=2)

296

```

297

298

### Format Detection and Error Handling

299

300

```python

301

from prov.serializers import DoNotExist

302

303

try:

304

# Attempt to read with format detection

305

doc = prov.read('unknown_format.dat')

306

except TypeError as e:

307

print(f"Format detection failed: {e}")

308

# Try with explicit format

309

doc = prov.read('unknown_format.dat', format='json')

310

311

try:

312

# Attempt to get unavailable serializer

313

serializer = get('unsupported_format')

314

except DoNotExist as e:

315

print(f"Serializer not available: {e}")

316

```

317

318

### Working with Streams

319

320

```python

321

import io

322

323

# Serialize to string buffer

324

buffer = io.StringIO()

325

doc.serialize(buffer, format='json')

326

json_content = buffer.getvalue()

327

328

# Deserialize from string buffer

329

input_buffer = io.StringIO(json_content)

330

loaded_doc = ProvDocument.deserialize(input_buffer, format='json')

331

332

# Binary formats (for some RDF serializations)

333

binary_buffer = io.BytesIO()

334

doc.serialize(binary_buffer, format='rdf', rdf_format='xml')

335

```

336

337

### Batch Processing

338

339

```python

340

import os

341

342

# Serialize document to multiple formats

343

formats = ['json', 'xml', 'rdf', 'provn']

344

base_name = 'provenance'

345

346

for fmt in formats:

347

filename = f"{base_name}.{fmt}"

348

try:

349

doc.serialize(filename, format=fmt)

350

print(f"Saved {filename}")

351

except Exception as e:

352

print(f"Failed to save {filename}: {e}")

353

354

# Read and convert between formats

355

def convert_format(input_file, output_file, output_format):

356

"""Convert PROV document between formats."""

357

doc = prov.read(input_file)

358

doc.serialize(output_file, format=output_format)

359

360

# Convert JSON to XML

361

convert_format('input.json', 'output.xml', 'xml')

362

```

363

364

### Handling Large Documents

365

366

```python

367

# For large documents, serialize directly to file

368

with open('large_document.json', 'w') as f:

369

doc.serialize(f, format='json')

370

371

# Stream processing for large RDF documents

372

def process_large_rdf(filename):

373

"""Process large RDF document efficiently."""

374

doc = ProvDocument.deserialize(filename, format='rdf')

375

376

# Process in chunks or specific record types

377

entities = doc.get_records(prov.model.ProvEntity)

378

activities = doc.get_records(prov.model.ProvActivity)

379

380

print(f"Found {len(entities)} entities and {len(activities)} activities")

381

```

382

383

### Custom Serialization Parameters

384

385

```python

386

# JSON serialization options

387

doc.serialize('compact.json', format='json', separators=(',', ':'))

388

doc.serialize('readable.json', format='json', indent=4, sort_keys=True)

389

390

# RDF serialization with base URI

391

doc.serialize('output.rdf', format='rdf',

392

rdf_format='turtle',

393

base='http://example.org/')

394

395

# XML serialization with encoding

396

doc.serialize('output.xml', format='xml', encoding='utf-8')

397

```