or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-processing.mddocument-loading.mdindex.mdjson-canonicalization.mdrdf-conversion.mdurl-utilities.md

url-utilities.mddocs/

0

# URL and IRI Utilities

1

2

Utility functions for URL parsing, IRI manipulation, and base URL resolution following RFC 3986 standards. These functions support JSON-LD's IRI processing requirements and URL normalization.

3

4

## Capabilities

5

6

### Base IRI Resolution

7

8

Functions for resolving relative IRIs against base IRIs and converting absolute IRIs back to relative form.

9

10

```python { .api }

11

def prepend_base(base, iri):

12

"""

13

Prepends a base IRI to a relative IRI to create an absolute IRI.

14

15

Args:

16

base (str): The base IRI to resolve against

17

iri (str): The relative IRI to resolve

18

19

Returns:

20

str: The absolute IRI

21

22

Raises:

23

JsonLdError: If base IRI is invalid or resolution fails

24

"""

25

26

def remove_base(base, iri):

27

"""

28

Removes a base IRI from an absolute IRI to create a relative IRI.

29

30

Args:

31

base (str): The base IRI to remove

32

iri (str): The absolute IRI to make relative

33

34

Returns:

35

str: The relative IRI if the IRI starts with base, otherwise the

36

original absolute IRI

37

38

Raises:

39

JsonLdError: If base IRI is invalid

40

"""

41

```

42

43

#### Examples

44

45

```python

46

from pyld import jsonld

47

48

# Resolve relative IRI against base

49

base = "https://example.org/data/"

50

relative = "document.jsonld"

51

absolute = jsonld.prepend_base(base, relative)

52

print(absolute) # "https://example.org/data/document.jsonld"

53

54

# Resolve with path traversal

55

relative_path = "../other/doc.jsonld"

56

resolved = jsonld.prepend_base(base, relative_path)

57

print(resolved) # "https://example.org/other/doc.jsonld"

58

59

# Make absolute IRI relative to base

60

absolute_iri = "https://example.org/data/context.jsonld"

61

relative_result = jsonld.remove_base(base, absolute_iri)

62

print(relative_result) # "context.jsonld"

63

64

# IRI not relative to base remains absolute

65

other_iri = "https://other.org/context.jsonld"

66

unchanged = jsonld.remove_base(base, other_iri)

67

print(unchanged) # "https://other.org/context.jsonld"

68

```

69

70

### URL Parsing and Construction

71

72

RFC 3986 compliant URL parsing and reconstruction utilities.

73

74

```python { .api }

75

def parse_url(url):

76

"""

77

Parses a URL into its component parts following RFC 3986.

78

79

Args:

80

url (str): The URL to parse

81

82

Returns:

83

ParsedUrl: Named tuple with components (scheme, authority, path, query, fragment)

84

85

Components:

86

scheme (str): URL scheme (http, https, etc.)

87

authority (str): Authority component (host:port)

88

path (str): Path component

89

query (str): Query string component

90

fragment (str): Fragment identifier component

91

"""

92

93

def unparse_url(parsed):

94

"""

95

Reconstructs a URL from its parsed components.

96

97

Args:

98

parsed (ParsedUrl, dict, list, or tuple): URL components

99

100

Returns:

101

str: The reconstructed URL

102

103

Raises:

104

TypeError: If parsed components are in invalid format

105

"""

106

```

107

108

#### Examples

109

110

```python

111

from pyld import jsonld

112

113

# Parse URL into components

114

url = "https://example.org:8080/path/to/doc.jsonld?param=value#section"

115

parsed = jsonld.parse_url(url)

116

117

print(parsed.scheme) # "https"

118

print(parsed.authority) # "example.org:8080" (default ports removed)

119

print(parsed.path) # "/path/to/doc.jsonld"

120

print(parsed.query) # "param=value"

121

print(parsed.fragment) # "section"

122

123

# Reconstruct URL from components

124

reconstructed = jsonld.unparse_url(parsed)

125

print(reconstructed) # "https://example.org:8080/path/to/doc.jsonld?param=value#section"

126

127

# Modify components and reconstruct

128

modified_parsed = parsed._replace(path="/new/path.jsonld", query="new=param")

129

new_url = jsonld.unparse_url(modified_parsed)

130

print(new_url) # "https://example.org:8080/new/path.jsonld?new=param#section"

131

132

# Parse URLs with missing components

133

simple_url = "https://example.org/doc"

134

simple_parsed = jsonld.parse_url(simple_url)

135

print(simple_parsed.query) # None

136

print(simple_parsed.fragment) # None

137

```

138

139

### Path Normalization

140

141

Utility for normalizing URL paths by removing dot segments according to RFC 3986.

142

143

```python { .api }

144

def remove_dot_segments(path):

145

"""

146

Removes dot segments from a URL path according to RFC 3986.

147

148

Resolves '.' and '..' segments in URL paths to create normalized paths.

149

150

Args:

151

path (str): The path to normalize

152

153

Returns:

154

str: The normalized path with dot segments removed

155

"""

156

```

157

158

#### Examples

159

160

```python

161

from pyld import jsonld

162

163

# Remove current directory references

164

path1 = "/a/b/./c"

165

normalized1 = jsonld.remove_dot_segments(path1)

166

print(normalized1) # "/a/b/c"

167

168

# Remove parent directory references

169

path2 = "/a/b/../c"

170

normalized2 = jsonld.remove_dot_segments(path2)

171

print(normalized2) # "/a/c"

172

173

# Complex path with multiple dot segments

174

path3 = "/a/b/c/./../../g"

175

normalized3 = jsonld.remove_dot_segments(path3)

176

print(normalized3) # "/a/g"

177

178

# Leading dot segments

179

path4 = "../../../g"

180

normalized4 = jsonld.remove_dot_segments(path4)

181

print(normalized4) # "g"

182

```

183

184

## ParsedUrl Structure

185

186

The `parse_url()` function returns a `ParsedUrl` named tuple with these fields:

187

188

```python { .api }

189

# ParsedUrl named tuple structure

190

ParsedUrl = namedtuple('ParsedUrl', ['scheme', 'authority', 'path', 'query', 'fragment'])

191

192

# Example ParsedUrl instance

193

ParsedUrl(

194

scheme='https',

195

authority='example.org:8080',

196

path='/path/to/resource',

197

query='param=value',

198

fragment='section'

199

)

200

```

201

202

### ParsedUrl Fields

203

204

- **scheme**: Protocol scheme (http, https, ftp, etc.) or None

205

- **authority**: Host and optional port (example.org:8080) or None

206

- **path**: Path component (always present, may be empty string)

207

- **query**: Query string without leading '?' or None

208

- **fragment**: Fragment identifier without leading '#' or None

209

210

### Default Port Handling

211

212

PyLD automatically removes default ports from the authority component:

213

214

```python

215

# Default ports are removed

216

url1 = "https://example.org:443/path"

217

parsed1 = jsonld.parse_url(url1)

218

print(parsed1.authority) # "example.org" (443 removed)

219

220

url2 = "http://example.org:80/path"

221

parsed2 = jsonld.parse_url(url2)

222

print(parsed2.authority) # "example.org" (80 removed)

223

224

# Non-default ports are preserved

225

url3 = "https://example.org:8080/path"

226

parsed3 = jsonld.parse_url(url3)

227

print(parsed3.authority) # "example.org:8080" (8080 preserved)

228

```

229

230

## IRI vs URL Handling

231

232

These utilities work with both URLs and IRIs (Internationalized Resource Identifiers):

233

234

```python

235

# ASCII URLs

236

ascii_url = "https://example.org/path"

237

parsed_ascii = jsonld.parse_url(ascii_url)

238

239

# International IRIs

240

iri = "https://例え.テスト/パス"

241

parsed_iri = jsonld.parse_url(iri)

242

243

# Both work with the same parsing logic

244

```

245

246

## Common Use Cases

247

248

### Base Context Resolution

249

250

```python

251

# Resolve context relative to document base

252

document_url = "https://example.org/data/document.jsonld"

253

context_ref = "../contexts/main.jsonld"

254

255

# Extract base from document URL

256

base = jsonld.remove_base("", document_url).rsplit('/', 1)[0] + "/"

257

context_url = jsonld.prepend_base(base, context_ref)

258

print(context_url) # "https://example.org/contexts/main.jsonld"

259

```

260

261

### URL Canonicalization

262

263

```python

264

def canonicalize_url(url):

265

"""Canonicalize URL by parsing and reconstructing."""

266

parsed = jsonld.parse_url(url)

267

# Normalize path

268

normalized_path = jsonld.remove_dot_segments(parsed.path)

269

canonical_parsed = parsed._replace(path=normalized_path)

270

return jsonld.unparse_url(canonical_parsed)

271

272

# Canonicalize URLs for comparison

273

url1 = "https://example.org/a/b/../c"

274

url2 = "https://example.org/a/c"

275

canonical1 = canonicalize_url(url1)

276

canonical2 = canonicalize_url(url2)

277

print(canonical1 == canonical2) # True

278

```

279

280

### Relative Link Resolution

281

282

```python

283

def resolve_links(base_url, links):

284

"""Resolve a list of relative links against a base URL."""

285

return [jsonld.prepend_base(base_url, link) for link in links]

286

287

base = "https://example.org/docs/"

288

relative_links = ["intro.html", "../images/logo.png", "section/details.html"]

289

absolute_links = resolve_links(base, relative_links)

290

# Result: ["https://example.org/docs/intro.html",

291

# "https://example.org/images/logo.png",

292

# "https://example.org/docs/section/details.html"]

293

```

294

295

## RFC 3986 Compliance

296

297

These utilities implement RFC 3986 URL processing standards:

298

299

- **Scheme**: Case-insensitive protocol identifier

300

- **Authority**: Host and optional port with default port removal

301

- **Path**: Hierarchical path with dot segment normalization

302

- **Query**: Optional query parameters

303

- **Fragment**: Optional fragment identifier

304

305

The implementation handles edge cases like:

306

- Empty path components

307

- Percent-encoded characters

308

- Unicode in IRIs

309

- Relative reference resolution

310

- Path traversal with '../' segments

311

312

## Error Handling

313

314

URL utility functions may raise `JsonLdError` for:

315

316

- **Invalid base IRI**: Malformed base IRI in resolution functions

317

- **Invalid URL format**: URLs that don't conform to RFC 3986

318

- **Resolution errors**: Failed relative IRI resolution

319

320

Handle URL errors appropriately:

321

322

```python

323

from pyld.jsonld import JsonLdError

324

325

try:

326

result = jsonld.prepend_base("invalid-base", "relative")

327

except JsonLdError as e:

328

print(f"URL resolution failed: {e}")

329

# Handle invalid base IRI

330

```