or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-processing.mddocument-loading.mdindex.mdjson-canonicalization.mdrdf-conversion.mdurl-utilities.md

document-loading.mddocs/

0

# Document Loading

1

2

Configurable document loaders for fetching remote JSON-LD contexts and documents via HTTP. PyLD supports both synchronous and asynchronous loading with pluggable HTTP client implementations.

3

4

## Capabilities

5

6

### Document Loader Management

7

8

Global document loader configuration for all JSON-LD processing operations.

9

10

```python { .api }

11

def set_document_loader(load_document_):

12

"""

13

Sets the global default JSON-LD document loader.

14

15

Args:

16

load_document_: Document loader function that takes (url, options)

17

and returns RemoteDocument

18

"""

19

20

def get_document_loader():

21

"""

22

Gets the current global document loader.

23

24

Returns:

25

function: Current document loader function

26

"""

27

28

def load_document(url, options, base=None, profile=None, requestProfile=None):

29

"""

30

Loads a document from a URL using the current document loader.

31

32

Args:

33

url (str): The URL (relative or absolute) of the remote document

34

options (dict): Loading options including documentLoader

35

base (str): The absolute URL to use for making url absolute

36

profile (str): Profile for selecting JSON-LD script elements from HTML

37

requestProfile (str): One or more IRIs for request profile parameter

38

39

Returns:

40

RemoteDocument: Loaded document with content and metadata

41

42

Raises:

43

JsonLdError: If document loading fails

44

"""

45

```

46

47

### Requests-based Document Loader

48

49

Synchronous HTTP document loader using the popular Requests library.

50

51

```python { .api }

52

def requests_document_loader(secure=False, **kwargs):

53

"""

54

Creates a document loader using the Requests library.

55

56

Args:

57

secure (bool): Require all requests to use HTTPS (default: False)

58

**kwargs: Additional keyword arguments passed to requests.get()

59

60

Common kwargs:

61

timeout (float or tuple): Request timeout in seconds

62

verify (bool or str): SSL certificate verification

63

cert (str or tuple): Client certificate for authentication

64

headers (dict): Custom HTTP headers

65

proxies (dict): Proxy configuration

66

allow_redirects (bool): Follow redirects (default: True)

67

stream (bool): Stream download (default: False)

68

69

Returns:

70

function: Document loader function compatible with PyLD

71

72

Raises:

73

ImportError: If requests library is not available

74

"""

75

```

76

77

#### Example

78

79

```python

80

from pyld import jsonld

81

82

# Basic requests loader with timeout

83

loader = jsonld.requests_document_loader(timeout=10)

84

jsonld.set_document_loader(loader)

85

86

# Advanced requests loader with SSL and authentication

87

secure_loader = jsonld.requests_document_loader(

88

secure=True, # Force HTTPS

89

timeout=(5, 30), # 5s connect, 30s read timeout

90

verify='/path/to/cacert.pem', # Custom CA bundle

91

cert=('/path/to/client.crt', '/path/to/client.key'), # Client cert

92

headers={'User-Agent': 'MyApp/1.0'},

93

proxies={'https': 'https://proxy.example.com:8080'}

94

)

95

jsonld.set_document_loader(secure_loader)

96

97

# Use in JSON-LD processing

98

doc = jsonld.expand('https://example.org/context.jsonld')

99

```

100

101

### Aiohttp-based Document Loader

102

103

Asynchronous HTTP document loader using aiohttp for high-performance concurrent operations.

104

105

```python { .api }

106

def aiohttp_document_loader(loop=None, secure=False, **kwargs):

107

"""

108

Creates an asynchronous document loader using aiohttp.

109

110

Args:

111

loop: Event loop for async operations (default: current loop)

112

secure (bool): Require all requests to use HTTPS (default: False)

113

**kwargs: Additional keyword arguments passed to aiohttp session

114

115

Common kwargs:

116

timeout (aiohttp.ClientTimeout): Request timeout configuration

117

connector (aiohttp.BaseConnector): Custom connector for connection pooling

118

headers (dict): Default headers for all requests

119

cookies (dict): Default cookies

120

auth (aiohttp.BasicAuth): Authentication credentials

121

trust_env (bool): Use environment proxy settings

122

connector_kwargs: Additional arguments for TCPConnector

123

124

Returns:

125

function: Async document loader function compatible with PyLD

126

127

Raises:

128

ImportError: If aiohttp library is not available

129

"""

130

```

131

132

#### Example

133

134

```python

135

import asyncio

136

from pyld import jsonld

137

import aiohttp

138

139

# Basic aiohttp loader

140

loader = jsonld.aiohttp_document_loader()

141

jsonld.set_document_loader(loader)

142

143

# Advanced aiohttp loader with custom configuration

144

timeout = aiohttp.ClientTimeout(total=30, connect=5)

145

connector = aiohttp.TCPConnector(

146

limit=100, # Total connection pool size

147

ttl_dns_cache=300, # DNS cache TTL

148

use_dns_cache=True

149

)

150

151

advanced_loader = jsonld.aiohttp_document_loader(

152

secure=True,

153

timeout=timeout,

154

connector=connector,

155

headers={'User-Agent': 'MyApp/1.0'},

156

auth=aiohttp.BasicAuth('user', 'pass')

157

)

158

jsonld.set_document_loader(advanced_loader)

159

160

# Process documents asynchronously

161

async def process_documents():

162

doc1 = jsonld.expand('https://example.org/doc1.jsonld')

163

doc2 = jsonld.expand('https://example.org/doc2.jsonld')

164

return doc1, doc2

165

166

# Note: aiohttp loader only provides async loading;

167

# JSON-LD processing itself remains synchronous

168

```

169

170

### Dummy Document Loader

171

172

Fallback loader that raises exceptions for all requests, used when no HTTP libraries are available.

173

174

```python { .api }

175

def dummy_document_loader(**kwargs):

176

"""

177

Creates a dummy document loader that raises exceptions on use.

178

179

Args:

180

**kwargs: Extra keyword arguments (ignored)

181

182

Returns:

183

function: Document loader that always fails

184

185

Raises:

186

JsonLdError: Always raises with 'loading document failed' error

187

"""

188

```

189

190

## RemoteDocument Structure

191

192

Document loaders return RemoteDocument objects with this structure:

193

194

```python { .api }

195

# RemoteDocument format

196

{

197

"document": {...}, # The loaded JSON-LD document

198

"documentUrl": "string", # Final URL after redirects

199

"contextUrl": "string" # Context URL if Link header present

200

}

201

```

202

203

### RemoteDocument Fields

204

205

- **document**: The parsed JSON-LD document content

206

- **documentUrl**: The final URL after following redirects

207

- **contextUrl**: Context URL extracted from HTTP Link header (optional)

208

209

## HTTP Link Header Processing

210

211

PyLD automatically processes HTTP Link headers to discover JSON-LD contexts:

212

213

```python { .api }

214

def parse_link_header(header):

215

"""

216

Parses HTTP Link header for JSON-LD context discovery.

217

218

Args:

219

header (str): HTTP Link header value

220

221

Returns:

222

list: Parsed link relationships with URLs and attributes

223

"""

224

```

225

226

#### Example

227

228

```python

229

from pyld import jsonld

230

231

# Link header parsing

232

header = '<https://example.org/context.jsonld>; rel="http://www.w3.org/ns/json-ld#context"'

233

links = jsonld.parse_link_header(header)

234

# Result: [{"target": "https://example.org/context.jsonld", "rel": "http://www.w3.org/ns/json-ld#context"}]

235

```

236

237

## Custom Document Loaders

238

239

Create custom document loaders for specialized requirements:

240

241

```python

242

def custom_document_loader(url, options=None):

243

"""

244

Custom document loader implementation.

245

246

Args:

247

url (str): Document URL to load

248

options (dict): Loading options

249

250

Returns:

251

dict: RemoteDocument with document, documentUrl, contextUrl

252

"""

253

try:

254

# Custom loading logic

255

if url.startswith('file://'):

256

# Handle file:// URLs

257

with open(url[7:], 'r') as f:

258

document = json.load(f)

259

return {

260

'document': document,

261

'documentUrl': url,

262

'contextUrl': None

263

}

264

elif url.startswith('cache://'):

265

# Handle cached documents

266

document = get_from_cache(url)

267

return {

268

'document': document,

269

'documentUrl': url,

270

'contextUrl': None

271

}

272

else:

273

# Fallback to default HTTP loading

274

return default_http_loader(url, options)

275

276

except Exception as e:

277

from pyld.jsonld import JsonLdError

278

raise JsonLdError(

279

f'Could not load document: {url}',

280

'loading document failed',

281

{'url': url},

282

cause=e

283

)

284

285

# Register custom loader

286

jsonld.set_document_loader(custom_document_loader)

287

```

288

289

## Security Considerations

290

291

### HTTPS Enforcement

292

293

```python

294

# Force HTTPS for all requests

295

loader = jsonld.requests_document_loader(secure=True)

296

jsonld.set_document_loader(loader)

297

```

298

299

### Certificate Verification

300

301

```python

302

# Custom CA bundle

303

loader = jsonld.requests_document_loader(

304

verify='/path/to/custom-cacert.pem'

305

)

306

307

# Disable verification (not recommended for production)

308

loader = jsonld.requests_document_loader(verify=False)

309

```

310

311

### Request Timeouts

312

313

```python

314

# Requests timeouts

315

loader = jsonld.requests_document_loader(

316

timeout=(5, 30) # 5s connect, 30s read

317

)

318

319

# Aiohttp timeouts

320

import aiohttp

321

timeout = aiohttp.ClientTimeout(total=30, connect=5)

322

loader = jsonld.aiohttp_document_loader(timeout=timeout)

323

```

324

325

### URL Filtering

326

327

```python

328

def filtered_document_loader(url, options=None):

329

"""Document loader with URL filtering."""

330

331

# Block private networks

332

if url.startswith('http://192.168.') or url.startswith('http://10.'):

333

raise JsonLdError('Private network access denied', 'loading document failed')

334

335

# Allow only specific domains

336

allowed_domains = ['example.org', 'w3.org', 'schema.org']

337

domain = urllib.parse.urlparse(url).netloc

338

if domain not in allowed_domains:

339

raise JsonLdError('Domain not allowed', 'loading document failed')

340

341

# Use standard loader for allowed URLs

342

return standard_loader(url, options)

343

344

jsonld.set_document_loader(filtered_document_loader)

345

```

346

347

## Default Loader Selection

348

349

PyLD automatically selects document loaders in this priority order:

350

351

1. **Requests** - If requests library is available (default)

352

2. **Aiohttp** - If aiohttp is available and requests is not

353

3. **Dummy** - Fallback that always fails

354

355

Override with explicit loader selection:

356

357

```python

358

# Force aiohttp even if requests is available

359

jsonld.set_document_loader(jsonld.aiohttp_document_loader())

360

361

# Or force requests

362

jsonld.set_document_loader(jsonld.requests_document_loader())

363

```

364

365

## Error Handling

366

367

Document loaders may raise `JsonLdError` with these error types:

368

369

- **loading document failed**: Network errors, timeouts, HTTP errors

370

- **invalid remote context**: Invalid JSON-LD context documents

371

- **recursive context inclusion**: Context import loops

372

373

Handle loading errors gracefully:

374

375

```python

376

from pyld.jsonld import JsonLdError

377

378

try:

379

result = jsonld.expand('https://example.org/doc.jsonld')

380

except JsonLdError as e:

381

if e.code == 'loading document failed':

382

print(f"Could not load document: {e.details}")

383

# Handle network error

384

else:

385

# Handle other JSON-LD errors

386

raise

387

```