CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-sitemap

Sitemap-generating library and CLI tool for creating XML sitemaps that comply with the sitemaps.org protocol

Pending
Overview
Eval results
Files

xml-validation.mddocs/

XML Validation

External validation capabilities using xmllint for ensuring generated sitemaps comply with XML schemas. This provides an additional layer of validation beyond the built-in JavaScript validation.

Capabilities

xmlLint Function

Validates XML content against the official sitemap schema using the external xmllint tool.

/**
 * Verify the passed in XML is valid using xmllint external tool
 * Requires xmllint to be installed on the system
 * @param xml - XML content as string or readable stream
 * @returns Promise that resolves on valid XML, rejects with error details
 * @throws XMLLintUnavailable if xmllint is not installed
 */
function xmlLint(xml: string | Readable): Promise<void>;

Usage Examples:

import { xmlLint } from "sitemap";
import { createReadStream } from "fs";

// Validate XML file
try {
  await xmlLint(createReadStream("sitemap.xml"));
  console.log("Sitemap is valid!");
} catch ([error, stderr]) {
  if (error.name === 'XMLLintUnavailable') {
    console.error("xmllint is not installed");
  } else {
    console.error("Validation failed:", stderr.toString());
  }
}

// Validate XML string
const xmlContent = `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2023-01-01T00:00:00.000Z</lastmod>
    <changefreq>daily</changefreq>
    <priority>1.0</priority>
  </url>
</urlset>`;

try {
  await xmlLint(xmlContent);
  console.log("XML is schema-compliant");
} catch (error) {
  console.error("Schema validation failed");
}

Integration with Sitemap Generation

Validate generated sitemaps to ensure they meet official specifications:

import { SitemapStream, xmlLint, streamToPromise } from "sitemap";

async function generateAndValidateSitemap() {
  // Create sitemap
  const sitemap = new SitemapStream({
    hostname: "https://example.com"
  });
  
  sitemap.write({ url: "/", changefreq: "daily", priority: 1.0 });
  sitemap.write({ url: "/about", changefreq: "monthly", priority: 0.7 });
  sitemap.end();
  
  // Get XML content
  const xmlBuffer = await streamToPromise(sitemap);
  
  // Validate against schema
  try {
    await xmlLint(xmlBuffer.toString());
    console.log("Generated sitemap is valid");
    return xmlBuffer;
  } catch ([error, stderr]) {
    console.error("Generated sitemap is invalid:", stderr.toString());
    throw error;
  }
}

Error Handling

XMLLintUnavailable Error

This error is thrown when xmllint is not installed on the system:

import { xmlLint, XMLLintUnavailable } from "sitemap";

try {
  await xmlLint("<invalid>xml</invalid>");
} catch (error) {
  if (error instanceof XMLLintUnavailable) {
    console.error("Please install xmllint:");
    console.error("Ubuntu/Debian: sudo apt-get install libxml2-utils");
    console.error("macOS: brew install libxml2");
    console.error("Or skip validation by not using xmlLint function");
  } else {
    console.error("Validation error:", error);
  }
}

Validation Error Handling

import { xmlLint } from "sitemap";

async function validateWithFallback(xmlContent: string) {
  try {
    await xmlLint(xmlContent);
    return { isValid: true, errors: [] };
  } catch ([error, stderr]) {
    if (error && error.name === 'XMLLintUnavailable') {
      console.warn("xmllint not available, skipping schema validation");
      return { isValid: null, errors: ["xmllint not available"] };
    } else {
      const errorMessage = stderr ? stderr.toString() : error?.message || "Unknown error";
      return { isValid: false, errors: [errorMessage] };
    }
  }
}

CLI Integration

The xmlLint function is also used by the command-line interface:

# Validate a sitemap file using CLI
npx sitemap --validate sitemap.xml

# Validate will output "valid" or error details
npx sitemap --validate invalid-sitemap.xml
# Output: Error details from xmllint

Installation Requirements

To use xmlLint validation, you need to install xmllint on your system:

Ubuntu/Debian

sudo apt-get install libxml2-utils

macOS

brew install libxml2

Windows

  • Download libxml2 from http://xmlsoft.org/downloads.html
  • Or use Windows Subsystem for Linux (WSL)
  • Or use Docker with a Linux container

Docker Example

FROM node:18
RUN apt-get update && apt-get install -y libxml2-utils
COPY . /app
WORKDIR /app
RUN npm install

Schema Validation Details

xmlLint validates against the official sitemap schemas:

  • Core sitemap: http://www.sitemaps.org/schemas/sitemap/0.9
  • Image extension: http://www.google.com/schemas/sitemap-image/1.1
  • Video extension: http://www.google.com/schemas/sitemap-video/1.1
  • News extension: http://www.google.com/schemas/sitemap-news/0.9

The validation ensures:

  • Proper XML structure and encoding
  • Correct namespace declarations
  • Valid element nesting
  • Required attributes are present
  • Data types match schema requirements
  • URL limits are respected (50,000 URLs max per sitemap)

Advanced Usage

Batch Validation

import { xmlLint } from "sitemap";
import { readdir, createReadStream } from "fs";
import { promisify } from "util";

const readdirAsync = promisify(readdir);

async function validateSitemapDirectory(directory: string) {
  const files = await readdirAsync(directory);
  const sitemapFiles = files.filter(f => f.endsWith('.xml'));
  
  const results = await Promise.allSettled(
    sitemapFiles.map(async (file) => {
      try {
        await xmlLint(createReadStream(`${directory}/${file}`));
        return { file, valid: true };
      } catch (error) {
        return { file, valid: false, error };
      }
    })
  );
  
  results.forEach((result, index) => {
    if (result.status === 'fulfilled') {
      const { file, valid, error } = result.value;
      console.log(`${file}: ${valid ? 'VALID' : 'INVALID'}`);
      if (!valid) {
        console.error(`  Error: ${error}`);
      }
    }
  });
}

Custom Schema Validation

While the built-in function uses the standard sitemap schema, you can use xmllint directly for custom validation:

import { execFile } from "child_process";
import { promisify } from "util";

const execFileAsync = promisify(execFile);

async function validateAgainstCustomSchema(xmlFile: string, schemaFile: string) {
  try {
    await execFileAsync('xmllint', [
      '--schema', schemaFile,
      '--noout',
      xmlFile
    ]);
    return true;
  } catch (error) {
    console.error("Custom schema validation failed:", error);
    return false;
  }
}

Best Practices

  1. Optional Validation: Always handle XMLLintUnavailable gracefully
  2. CI/CD Integration: Include xmllint in build containers for automated validation
  3. Development vs Production: Use validation in development and testing, consider skipping in production for performance
  4. Error Reporting: Capture and log validation errors for debugging
  5. Schema Updates: Keep xmllint updated to support latest sitemap specifications

Install with Tessl CLI

npx tessl i tessl/npm-sitemap

docs

cli-interface.md

error-handling.md

index.md

simple-api.md

sitemap-index.md

sitemap-parsing.md

sitemap-streams.md

validation-utilities.md

xml-validation.md

tile.json