0
# XML Validation
1
2
External validation capabilities using xmllint for ensuring generated sitemaps comply with XML schemas. This provides an additional layer of validation beyond the built-in JavaScript validation.
3
4
## Capabilities
5
6
### xmlLint Function
7
8
Validates XML content against the official sitemap schema using the external xmllint tool.
9
10
```typescript { .api }
11
/**
12
* Verify the passed in XML is valid using xmllint external tool
13
* Requires xmllint to be installed on the system
14
* @param xml - XML content as string or readable stream
15
* @returns Promise that resolves on valid XML, rejects with error details
16
* @throws XMLLintUnavailable if xmllint is not installed
17
*/
18
function xmlLint(xml: string | Readable): Promise<void>;
19
```
20
21
**Usage Examples:**
22
23
```typescript
24
import { xmlLint } from "sitemap";
25
import { createReadStream } from "fs";
26
27
// Validate XML file
28
try {
29
await xmlLint(createReadStream("sitemap.xml"));
30
console.log("Sitemap is valid!");
31
} catch ([error, stderr]) {
32
if (error.name === 'XMLLintUnavailable') {
33
console.error("xmllint is not installed");
34
} else {
35
console.error("Validation failed:", stderr.toString());
36
}
37
}
38
39
// Validate XML string
40
const xmlContent = `<?xml version="1.0" encoding="UTF-8"?>
41
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
42
<url>
43
<loc>https://example.com/</loc>
44
<lastmod>2023-01-01T00:00:00.000Z</lastmod>
45
<changefreq>daily</changefreq>
46
<priority>1.0</priority>
47
</url>
48
</urlset>`;
49
50
try {
51
await xmlLint(xmlContent);
52
console.log("XML is schema-compliant");
53
} catch (error) {
54
console.error("Schema validation failed");
55
}
56
```
57
58
### Integration with Sitemap Generation
59
60
Validate generated sitemaps to ensure they meet official specifications:
61
62
```typescript
63
import { SitemapStream, xmlLint, streamToPromise } from "sitemap";
64
65
async function generateAndValidateSitemap() {
66
// Create sitemap
67
const sitemap = new SitemapStream({
68
hostname: "https://example.com"
69
});
70
71
sitemap.write({ url: "/", changefreq: "daily", priority: 1.0 });
72
sitemap.write({ url: "/about", changefreq: "monthly", priority: 0.7 });
73
sitemap.end();
74
75
// Get XML content
76
const xmlBuffer = await streamToPromise(sitemap);
77
78
// Validate against schema
79
try {
80
await xmlLint(xmlBuffer.toString());
81
console.log("Generated sitemap is valid");
82
return xmlBuffer;
83
} catch ([error, stderr]) {
84
console.error("Generated sitemap is invalid:", stderr.toString());
85
throw error;
86
}
87
}
88
```
89
90
## Error Handling
91
92
### XMLLintUnavailable Error
93
94
This error is thrown when xmllint is not installed on the system:
95
96
```typescript
97
import { xmlLint, XMLLintUnavailable } from "sitemap";
98
99
try {
100
await xmlLint("<invalid>xml</invalid>");
101
} catch (error) {
102
if (error instanceof XMLLintUnavailable) {
103
console.error("Please install xmllint:");
104
console.error("Ubuntu/Debian: sudo apt-get install libxml2-utils");
105
console.error("macOS: brew install libxml2");
106
console.error("Or skip validation by not using xmlLint function");
107
} else {
108
console.error("Validation error:", error);
109
}
110
}
111
```
112
113
### Validation Error Handling
114
115
```typescript
116
import { xmlLint } from "sitemap";
117
118
async function validateWithFallback(xmlContent: string) {
119
try {
120
await xmlLint(xmlContent);
121
return { isValid: true, errors: [] };
122
} catch ([error, stderr]) {
123
if (error && error.name === 'XMLLintUnavailable') {
124
console.warn("xmllint not available, skipping schema validation");
125
return { isValid: null, errors: ["xmllint not available"] };
126
} else {
127
const errorMessage = stderr ? stderr.toString() : error?.message || "Unknown error";
128
return { isValid: false, errors: [errorMessage] };
129
}
130
}
131
}
132
```
133
134
## CLI Integration
135
136
The xmlLint function is also used by the command-line interface:
137
138
```bash
139
# Validate a sitemap file using CLI
140
npx sitemap --validate sitemap.xml
141
142
# Validate will output "valid" or error details
143
npx sitemap --validate invalid-sitemap.xml
144
# Output: Error details from xmllint
145
```
146
147
## Installation Requirements
148
149
To use xmlLint validation, you need to install xmllint on your system:
150
151
### Ubuntu/Debian
152
```bash
153
sudo apt-get install libxml2-utils
154
```
155
156
### macOS
157
```bash
158
brew install libxml2
159
```
160
161
### Windows
162
- Download libxml2 from http://xmlsoft.org/downloads.html
163
- Or use Windows Subsystem for Linux (WSL)
164
- Or use Docker with a Linux container
165
166
### Docker Example
167
```dockerfile
168
FROM node:18
169
RUN apt-get update && apt-get install -y libxml2-utils
170
COPY . /app
171
WORKDIR /app
172
RUN npm install
173
```
174
175
## Schema Validation Details
176
177
xmlLint validates against the official sitemap schemas:
178
179
- **Core sitemap**: `http://www.sitemaps.org/schemas/sitemap/0.9`
180
- **Image extension**: `http://www.google.com/schemas/sitemap-image/1.1`
181
- **Video extension**: `http://www.google.com/schemas/sitemap-video/1.1`
182
- **News extension**: `http://www.google.com/schemas/sitemap-news/0.9`
183
184
The validation ensures:
185
- Proper XML structure and encoding
186
- Correct namespace declarations
187
- Valid element nesting
188
- Required attributes are present
189
- Data types match schema requirements
190
- URL limits are respected (50,000 URLs max per sitemap)
191
192
## Advanced Usage
193
194
### Batch Validation
195
196
```typescript
197
import { xmlLint } from "sitemap";
198
import { readdir, createReadStream } from "fs";
199
import { promisify } from "util";
200
201
const readdirAsync = promisify(readdir);
202
203
async function validateSitemapDirectory(directory: string) {
204
const files = await readdirAsync(directory);
205
const sitemapFiles = files.filter(f => f.endsWith('.xml'));
206
207
const results = await Promise.allSettled(
208
sitemapFiles.map(async (file) => {
209
try {
210
await xmlLint(createReadStream(`${directory}/${file}`));
211
return { file, valid: true };
212
} catch (error) {
213
return { file, valid: false, error };
214
}
215
})
216
);
217
218
results.forEach((result, index) => {
219
if (result.status === 'fulfilled') {
220
const { file, valid, error } = result.value;
221
console.log(`${file}: ${valid ? 'VALID' : 'INVALID'}`);
222
if (!valid) {
223
console.error(` Error: ${error}`);
224
}
225
}
226
});
227
}
228
```
229
230
### Custom Schema Validation
231
232
While the built-in function uses the standard sitemap schema, you can use xmllint directly for custom validation:
233
234
```typescript
235
import { execFile } from "child_process";
236
import { promisify } from "util";
237
238
const execFileAsync = promisify(execFile);
239
240
async function validateAgainstCustomSchema(xmlFile: string, schemaFile: string) {
241
try {
242
await execFileAsync('xmllint', [
243
'--schema', schemaFile,
244
'--noout',
245
xmlFile
246
]);
247
return true;
248
} catch (error) {
249
console.error("Custom schema validation failed:", error);
250
return false;
251
}
252
}
253
```
254
255
## Best Practices
256
257
1. **Optional Validation**: Always handle XMLLintUnavailable gracefully
258
2. **CI/CD Integration**: Include xmllint in build containers for automated validation
259
3. **Development vs Production**: Use validation in development and testing, consider skipping in production for performance
260
4. **Error Reporting**: Capture and log validation errors for debugging
261
5. **Schema Updates**: Keep xmllint updated to support latest sitemap specifications