or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

cli.mdindex.mdmodels.mdparsing.mdspdx3.mdvalidation.mdwriting.md

parsing.mddocs/

0

# Document Parsing

1

2

Comprehensive parsing functionality for SPDX documents supporting all major formats with automatic format detection, robust error handling, and encoding support.

3

4

## Capabilities

5

6

### Universal File Parsing

7

8

Parse SPDX documents from files with automatic format detection based on file extension.

9

10

```python { .api }

11

def parse_file(file_name: str, encoding: str = "utf-8") -> Document:

12

"""

13

Parse SPDX file in any supported format.

14

15

Automatically detects format from file extension:

16

- .spdx, .tag -> Tag/Value format

17

- .json -> JSON format

18

- .yaml, .yml -> YAML format

19

- .xml -> XML format

20

- .rdf, .rdf.xml -> RDF/XML format

21

22

Args:

23

file_name: Path to SPDX file

24

encoding: File encoding (default: utf-8, recommended)

25

26

Returns:

27

Document: Parsed SPDX document object

28

29

Raises:

30

SPDXParsingError: If parsing fails with detailed error messages

31

FileNotFoundError: If file doesn't exist

32

"""

33

```

34

35

### JSON Parsing

36

37

Parse SPDX documents from JSON format files.

38

39

```python { .api }

40

def parse_from_file(file_name: str, encoding: str = "utf-8") -> Document:

41

"""

42

Parse SPDX document from JSON file.

43

44

Args:

45

file_name: Path to JSON file

46

encoding: File encoding

47

48

Returns:

49

Document: Parsed SPDX document

50

51

Raises:

52

SPDXParsingError: If JSON parsing fails

53

JSONDecodeError: If JSON is invalid

54

"""

55

```

56

57

### XML Parsing

58

59

Parse SPDX documents from XML format files.

60

61

```python { .api }

62

def parse_from_file(file_name: str, encoding: str = "utf-8") -> Document:

63

"""

64

Parse SPDX document from XML file.

65

66

Args:

67

file_name: Path to XML file

68

encoding: File encoding

69

70

Returns:

71

Document: Parsed SPDX document

72

73

Raises:

74

SPDXParsingError: If XML parsing fails

75

ExpatError: If XML is malformed

76

"""

77

```

78

79

### YAML Parsing

80

81

Parse SPDX documents from YAML format files.

82

83

```python { .api }

84

def parse_from_file(file_name: str, encoding: str = "utf-8") -> Document:

85

"""

86

Parse SPDX document from YAML file.

87

88

Args:

89

file_name: Path to YAML file

90

encoding: File encoding

91

92

Returns:

93

Document: Parsed SPDX document

94

95

Raises:

96

SPDXParsingError: If YAML parsing fails

97

ScannerError: If YAML is invalid

98

"""

99

```

100

101

### Tag-Value Parsing

102

103

Parse SPDX documents from Tag-Value format files (the original SPDX format).

104

105

```python { .api }

106

def parse_from_file(file_name: str, encoding: str = "utf-8") -> Document:

107

"""

108

Parse SPDX document from Tag-Value file.

109

110

Args:

111

file_name: Path to Tag-Value file (.spdx extension)

112

encoding: File encoding

113

114

Returns:

115

Document: Parsed SPDX document

116

117

Raises:

118

SPDXParsingError: If Tag-Value parsing fails

119

"""

120

```

121

122

### RDF Parsing

123

124

Parse SPDX documents from RDF/XML format files.

125

126

```python { .api }

127

def parse_from_file(file_name: str, encoding: str = "utf-8") -> Document:

128

"""

129

Parse SPDX document from RDF/XML file.

130

131

Args:

132

file_name: Path to RDF file (.rdf or .rdf.xml extension)

133

encoding: File encoding

134

135

Returns:

136

Document: Parsed SPDX document

137

138

Raises:

139

SPDXParsingError: If RDF parsing fails

140

SAXParseException: If RDF/XML is malformed

141

"""

142

```

143

144

### Format Detection

145

146

Determine SPDX file format from filename extension.

147

148

```python { .api }

149

def file_name_to_format(file_name: str) -> FileFormat:

150

"""

151

Detect SPDX file format from filename extension.

152

153

Supported extensions:

154

- .rdf, .rdf.xml -> RDF_XML

155

- .tag, .spdx -> TAG_VALUE

156

- .json -> JSON

157

- .xml -> XML

158

- .yaml, .yml -> YAML

159

160

Args:

161

file_name: File path or name

162

163

Returns:

164

FileFormat: Detected format enum value

165

166

Raises:

167

SPDXParsingError: If file extension is not supported

168

"""

169

```

170

171

### Error Handling

172

173

Comprehensive error handling with detailed error messages for parsing failures.

174

175

```python { .api }

176

class SPDXParsingError(Exception):

177

"""

178

Exception raised when SPDX parsing fails.

179

180

Contains detailed error messages about parsing failures.

181

"""

182

183

def get_messages(self) -> List[str]:

184

"""

185

Get list of detailed parsing error messages.

186

187

Returns:

188

List of error message strings

189

"""

190

```

191

192

## Usage Examples

193

194

### Basic File Parsing

195

196

```python

197

from spdx_tools.spdx.parser.parse_anything import parse_file

198

199

# Parse any supported format

200

try:

201

document = parse_file("example.spdx")

202

print(f"Parsed document: {document.creation_info.name}")

203

print(f"SPDX version: {document.creation_info.spdx_version}")

204

print(f"Packages: {len(document.packages)}")

205

print(f"Files: {len(document.files)}")

206

except Exception as e:

207

print(f"Parsing failed: {e}")

208

```

209

210

### Format-Specific Parsing

211

212

```python

213

from spdx_tools.spdx.parser.json import json_parser

214

from spdx_tools.spdx.parser.xml import xml_parser

215

from spdx_tools.spdx.parser.yaml import yaml_parser

216

217

# Parse specific formats

218

json_doc = json_parser.parse_from_file("document.json")

219

xml_doc = xml_parser.parse_from_file("document.xml")

220

yaml_doc = yaml_parser.parse_from_file("document.yaml")

221

```

222

223

### Error Handling

224

225

```python

226

from spdx_tools.spdx.parser.parse_anything import parse_file

227

from spdx_tools.spdx.parser.error import SPDXParsingError

228

from json import JSONDecodeError

229

from xml.parsers.expat import ExpatError

230

from xml.sax import SAXParseException

231

from yaml.scanner import ScannerError

232

233

try:

234

document = parse_file("problematic.spdx")

235

except SPDXParsingError as e:

236

print("SPDX parsing errors:")

237

for message in e.get_messages():

238

print(f" - {message}")

239

except JSONDecodeError as e:

240

print(f"Invalid JSON: {e}")

241

except ExpatError as e:

242

print(f"Invalid XML: {e}")

243

except SAXParseException as e:

244

print(f"Invalid RDF/XML: {e}")

245

except ScannerError as e:

246

print(f"Invalid YAML: {e}")

247

except FileNotFoundError as e:

248

print(f"File not found: {e.filename}")

249

```

250

251

### Custom Encoding

252

253

```python

254

# Parse file with specific encoding

255

document = parse_file("document.spdx", encoding="latin-1")

256

```

257

258

## Types

259

260

```python { .api }

261

from enum import Enum

262

from typing import List

263

264

class FileFormat(Enum):

265

"""Supported SPDX file formats for parsing."""

266

JSON = "json"

267

YAML = "yaml"

268

XML = "xml"

269

TAG_VALUE = "tag_value"

270

RDF_XML = "rdf_xml"

271

272

class SPDXParsingError(Exception):

273

"""Exception for SPDX parsing failures."""

274

def get_messages(self) -> List[str]: ...

275

```