0
# Document Parsing
1
2
Comprehensive parsing functionality for SPDX documents supporting all major formats with automatic format detection, robust error handling, and encoding support.
3
4
## Capabilities
5
6
### Universal File Parsing
7
8
Parse SPDX documents from files with automatic format detection based on file extension.
9
10
```python { .api }
11
def parse_file(file_name: str, encoding: str = "utf-8") -> Document:
12
"""
13
Parse SPDX file in any supported format.
14
15
Automatically detects format from file extension:
16
- .spdx, .tag -> Tag/Value format
17
- .json -> JSON format
18
- .yaml, .yml -> YAML format
19
- .xml -> XML format
20
- .rdf, .rdf.xml -> RDF/XML format
21
22
Args:
23
file_name: Path to SPDX file
24
encoding: File encoding (default: utf-8, recommended)
25
26
Returns:
27
Document: Parsed SPDX document object
28
29
Raises:
30
SPDXParsingError: If parsing fails with detailed error messages
31
FileNotFoundError: If file doesn't exist
32
"""
33
```
34
35
### JSON Parsing
36
37
Parse SPDX documents from JSON format files.
38
39
```python { .api }
40
def parse_from_file(file_name: str, encoding: str = "utf-8") -> Document:
41
"""
42
Parse SPDX document from JSON file.
43
44
Args:
45
file_name: Path to JSON file
46
encoding: File encoding
47
48
Returns:
49
Document: Parsed SPDX document
50
51
Raises:
52
SPDXParsingError: If JSON parsing fails
53
JSONDecodeError: If JSON is invalid
54
"""
55
```
56
57
### XML Parsing
58
59
Parse SPDX documents from XML format files.
60
61
```python { .api }
62
def parse_from_file(file_name: str, encoding: str = "utf-8") -> Document:
63
"""
64
Parse SPDX document from XML file.
65
66
Args:
67
file_name: Path to XML file
68
encoding: File encoding
69
70
Returns:
71
Document: Parsed SPDX document
72
73
Raises:
74
SPDXParsingError: If XML parsing fails
75
ExpatError: If XML is malformed
76
"""
77
```
78
79
### YAML Parsing
80
81
Parse SPDX documents from YAML format files.
82
83
```python { .api }
84
def parse_from_file(file_name: str, encoding: str = "utf-8") -> Document:
85
"""
86
Parse SPDX document from YAML file.
87
88
Args:
89
file_name: Path to YAML file
90
encoding: File encoding
91
92
Returns:
93
Document: Parsed SPDX document
94
95
Raises:
96
SPDXParsingError: If YAML parsing fails
97
ScannerError: If YAML is invalid
98
"""
99
```
100
101
### Tag-Value Parsing
102
103
Parse SPDX documents from Tag-Value format files (the original SPDX format).
104
105
```python { .api }
106
def parse_from_file(file_name: str, encoding: str = "utf-8") -> Document:
107
"""
108
Parse SPDX document from Tag-Value file.
109
110
Args:
111
file_name: Path to Tag-Value file (.spdx extension)
112
encoding: File encoding
113
114
Returns:
115
Document: Parsed SPDX document
116
117
Raises:
118
SPDXParsingError: If Tag-Value parsing fails
119
"""
120
```
121
122
### RDF Parsing
123
124
Parse SPDX documents from RDF/XML format files.
125
126
```python { .api }
127
def parse_from_file(file_name: str, encoding: str = "utf-8") -> Document:
128
"""
129
Parse SPDX document from RDF/XML file.
130
131
Args:
132
file_name: Path to RDF file (.rdf or .rdf.xml extension)
133
encoding: File encoding
134
135
Returns:
136
Document: Parsed SPDX document
137
138
Raises:
139
SPDXParsingError: If RDF parsing fails
140
SAXParseException: If RDF/XML is malformed
141
"""
142
```
143
144
### Format Detection
145
146
Determine SPDX file format from filename extension.
147
148
```python { .api }
149
def file_name_to_format(file_name: str) -> FileFormat:
150
"""
151
Detect SPDX file format from filename extension.
152
153
Supported extensions:
154
- .rdf, .rdf.xml -> RDF_XML
155
- .tag, .spdx -> TAG_VALUE
156
- .json -> JSON
157
- .xml -> XML
158
- .yaml, .yml -> YAML
159
160
Args:
161
file_name: File path or name
162
163
Returns:
164
FileFormat: Detected format enum value
165
166
Raises:
167
SPDXParsingError: If file extension is not supported
168
"""
169
```
170
171
### Error Handling
172
173
Comprehensive error handling with detailed error messages for parsing failures.
174
175
```python { .api }
176
class SPDXParsingError(Exception):
177
"""
178
Exception raised when SPDX parsing fails.
179
180
Contains detailed error messages about parsing failures.
181
"""
182
183
def get_messages(self) -> List[str]:
184
"""
185
Get list of detailed parsing error messages.
186
187
Returns:
188
List of error message strings
189
"""
190
```
191
192
## Usage Examples
193
194
### Basic File Parsing
195
196
```python
197
from spdx_tools.spdx.parser.parse_anything import parse_file
198
199
# Parse any supported format
200
try:
201
document = parse_file("example.spdx")
202
print(f"Parsed document: {document.creation_info.name}")
203
print(f"SPDX version: {document.creation_info.spdx_version}")
204
print(f"Packages: {len(document.packages)}")
205
print(f"Files: {len(document.files)}")
206
except Exception as e:
207
print(f"Parsing failed: {e}")
208
```
209
210
### Format-Specific Parsing
211
212
```python
213
from spdx_tools.spdx.parser.json import json_parser
214
from spdx_tools.spdx.parser.xml import xml_parser
215
from spdx_tools.spdx.parser.yaml import yaml_parser
216
217
# Parse specific formats
218
json_doc = json_parser.parse_from_file("document.json")
219
xml_doc = xml_parser.parse_from_file("document.xml")
220
yaml_doc = yaml_parser.parse_from_file("document.yaml")
221
```
222
223
### Error Handling
224
225
```python
226
from spdx_tools.spdx.parser.parse_anything import parse_file
227
from spdx_tools.spdx.parser.error import SPDXParsingError
228
from json import JSONDecodeError
229
from xml.parsers.expat import ExpatError
230
from xml.sax import SAXParseException
231
from yaml.scanner import ScannerError
232
233
try:
234
document = parse_file("problematic.spdx")
235
except SPDXParsingError as e:
236
print("SPDX parsing errors:")
237
for message in e.get_messages():
238
print(f" - {message}")
239
except JSONDecodeError as e:
240
print(f"Invalid JSON: {e}")
241
except ExpatError as e:
242
print(f"Invalid XML: {e}")
243
except SAXParseException as e:
244
print(f"Invalid RDF/XML: {e}")
245
except ScannerError as e:
246
print(f"Invalid YAML: {e}")
247
except FileNotFoundError as e:
248
print(f"File not found: {e.filename}")
249
```
250
251
### Custom Encoding
252
253
```python
254
# Parse file with specific encoding
255
document = parse_file("document.spdx", encoding="latin-1")
256
```
257
258
## Types
259
260
```python { .api }
261
from enum import Enum
262
from typing import List
263
264
class FileFormat(Enum):
265
"""Supported SPDX file formats for parsing."""
266
JSON = "json"
267
YAML = "yaml"
268
XML = "xml"
269
TAG_VALUE = "tag_value"
270
RDF_XML = "rdf_xml"
271
272
class SPDXParsingError(Exception):
273
"""Exception for SPDX parsing failures."""
274
def get_messages(self) -> List[str]: ...
275
```