CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-yaml

JavaScript parser and stringifier for YAML documents with complete YAML 1.1 and 1.2 support

Overview
Eval results
Files

parser-infrastructure.mddocs/

Parser Infrastructure

Low-level parsing components including lexer, parser, and composer for advanced YAML processing and custom tooling development. These components provide direct access to the YAML parsing pipeline for specialized use cases.

Capabilities

Lexer

The Lexer tokenizes YAML source text into a stream of tokens representing the concrete syntax structure.

class Lexer {
  /**
   * Tokenize YAML source into token stream
   * @param src - YAML source string to tokenize
   * @returns Generator yielding Token objects
   */
  lex(src: string): Generator<Token>;
}

interface Token {
  /** Token type identifier */
  type: string;
  
  /** Character offset in source */
  offset: number;
  
  /** Token length */
  indent: number;
  
  /** Source text content */
  source: string;
}

Usage Examples:

import { Lexer } from "yaml";

const lexer = new Lexer();
const source = `
name: John Doe
age: 30
hobbies:
  - reading
  - coding
`;

// Tokenize source
const tokens = Array.from(lexer.lex(source));

tokens.forEach((token, index) => {
  console.log(`Token ${index}:`, {
    type: token.type,
    offset: token.offset,
    content: JSON.stringify(token.source)
  });
});

// Example output:
// Token 0: { type: 'stream-start', offset: 0, content: '' }
// Token 1: { type: 'block-map-key', offset: 1, content: 'name' }
// Token 2: { type: 'block-map-value', offset: 5, content: ':' }
// Token 3: { type: 'scalar', offset: 7, content: 'John Doe' }
// ...

Parser

The Parser converts token streams into Concrete Syntax Tree (CST) nodes, preserving source structure and formatting.

class Parser {
  /**
   * Create parser with optional newline callback
   * @param onNewLine - Called for each newline encountered
   */
  constructor(onNewLine?: (offset: number) => void);
  
  /**
   * Parse token stream into CST nodes
   * @param src - YAML source string
   * @returns Generator yielding CST nodes
   */
  parse(src: string): Generator<ParsedNode>;
  
  /**
   * Parse next token from internal state
   * @returns Next CST node or null
   */
  next(): ParsedNode | null;
  
  /**
   * Signal end of input
   * @returns Final CST node or null
   */
  end(): ParsedNode | null;
}

Usage Examples:

import { Parser, LineCounter } from "yaml";

// Create parser with line tracking
const lineCounter = new LineCounter();
const parser = new Parser(lineCounter.addNewLine);

const source = `
documents:
  - title: "First Document"
    content: "Hello World"
  - title: "Second Document"  
    content: "Goodbye World"
`;

// Parse into CST nodes
const cstNodes = Array.from(parser.parse(source));

cstNodes.forEach((node, index) => {
  console.log(`CST Node ${index}:`, {
    type: node.type,
    range: node.range,
    value: node.source
  });
});

// Manual parsing with next()
const parser2 = new Parser();
const tokens = new Lexer().lex(source);

for (const token of tokens) {
  const node = parser2.next();
  if (node) {
    console.log('Parsed node:', node.type);
  }
}

// Finalize parsing
const finalNode = parser2.end();
if (finalNode) {
  console.log('Final node:', finalNode.type);
}

Composer

The Composer converts CST nodes into Document objects with full AST representation.

class Composer<Contents = ParsedNode, Strict = true> {
  /**
   * Create composer with options
   * @param options - Document and schema options
   */
  constructor(options?: DocumentOptions & SchemaOptions);
  
  /**
   * Compose CST tokens into Document objects
   * @param tokens - Token stream from parser
   * @param forceDoc - Force document creation even for empty input
   * @param endOffset - Expected end offset for validation
   * @returns Generator yielding Document objects
   */
  compose(
    tokens: Iterable<Token>, 
    forceDoc?: boolean, 
    endOffset?: number
  ): Generator<Document<Contents, Strict>>;
  
  /**
   * Process next token
   * @param token - Token to process
   * @returns Document if completed, undefined otherwise
   */
  next(token: Token): Document<Contents, Strict> | undefined;
  
  /**
   * Signal end of token stream
   * @param forceDoc - Force document creation
   * @param endOffset - Expected end offset
   * @returns Final document or undefined
   */
  end(forceDoc?: boolean, endOffset?: number): Document<Contents, Strict> | undefined;
  
  /**
   * Get stream information for empty documents
   * @returns Stream metadata
   */
  streamInfo(): {
    comment: string;
    directives: Directives;
    errors: YAMLError[];
    warnings: YAMLError[];
  };
}

Usage Examples:

import { Lexer, Parser, Composer } from "yaml";

const source = `
# Multi-document YAML
---
document: 1
title: "First Document"
data:
  - item1
  - item2
---
document: 2  
title: "Second Document"
data:
  - item3
  - item4
---
# Empty document follows
`;

// Complete parsing pipeline
const lexer = new Lexer();
const parser = new Parser();
const composer = new Composer({
  version: '1.2',
  keepCstNodes: true
});

// Process through pipeline
const tokens = lexer.lex(source);
const cstNodes = parser.parse(source);
const documents = Array.from(composer.compose(cstNodes));

console.log(`Parsed ${documents.length} documents`);

documents.forEach((doc, index) => {
  console.log(`Document ${index + 1}:`, doc.toJS());
  console.log(`Errors: ${doc.errors.length}, Warnings: ${doc.warnings.length}`);
});

// Handle empty stream
if (documents.length === 0) {
  const streamInfo = composer.streamInfo();
  console.log('Empty stream info:', streamInfo);
}

Line Counter

Utility for tracking line and column positions in source text for error reporting.

class LineCounter {
  /**
   * Register newline character at offset
   * @param offset - Character position of newline
   */
  addNewLine(offset: number): void;
  
  /**
   * Get line and column position for offset
   * @param pos - Character position
   * @returns Line and column numbers (1-based)
   */
  linePos(pos: number): { line: number; col: number };
}

Usage Examples:

import { LineCounter, Parser, prettifyError } from "yaml";

const source = `
line 1: value
line 2: another value
line 3: 
  nested: content
  invalid: [unclosed array
line 6: final value
`;

// Track line positions during parsing
const lineCounter = new LineCounter();
const parser = new Parser(lineCounter.addNewLine);

try {
  const nodes = Array.from(parser.parse(source));
  console.log('Parsing completed successfully');
} catch (error) {
  // Add position context to errors
  if (error.pos) {
    const position = lineCounter.linePos(error.pos[0]);
    console.log(`Error at line ${position.line}, column ${position.col}`);
    
    // Pretty print error with context
    const formatter = prettifyError(source, lineCounter);
    formatter(error);
  }
}

// Manual position tracking
source.split('').forEach((char, index) => {
  if (char === '\n') {
    lineCounter.addNewLine(index);
  }
});

// Query positions
console.log('Position 50:', lineCounter.linePos(50));
console.log('Position 100:', lineCounter.linePos(100));

CST (Concrete Syntax Tree) Namespace

Complete set of CST interfaces and utilities for working with YAML's concrete syntax representation.

namespace CST {
  /** Base interface for all CST tokens */
  interface Token {
    type: string;
    offset: number;
    indent: number;
    source: string;
  }
  
  /** Document-level CST node */
  interface Document extends Token {
    start: Token;
    value?: Token;
    end?: Token[];
  }
  
  /** Scalar value token */
  interface FlowScalar extends Token {
    type: 'scalar';
    end: Token[];
  }
  
  /** Block scalar token */
  interface BlockScalar extends Token {
    type: 'block-scalar';
    header: Token;
    value?: string;
  }
  
  /** Collection tokens */
  interface FlowCollection extends Token {
    type: 'flow-collection';
    start: Token;
    items: Token[];
    end: Token;
  }
  
  interface BlockCollection extends Token {
    type: 'block-collection';
    items: Token[];
  }
  
  /**
   * Convert CST to string representation
   * @param cst - CST node to stringify
   * @returns String representation
   */
  function stringify(cst: Token): string;
  
  /**
   * Visit CST nodes with callback
   * @param cst - Root CST node
   * @param visitor - Visitor function
   */
  function visit(cst: Token, visitor: (token: Token) => void): void;
  
  /**
   * Create scalar token
   * @param value - Scalar value
   * @param context - Creation context
   * @returns Scalar token
   */
  function createScalarToken(value: string, context?: any): FlowScalar;
}

Usage Examples:

import { Lexer, Parser, CST } from "yaml";

const source = `
name: "John Doe"
age: 30
active: true
`;

// Access CST directly
const lexer = new Lexer();
const parser = new Parser();

const tokens = Array.from(lexer.lex(source));
const cstNodes = Array.from(parser.parse(source));

// Work with CST nodes
cstNodes.forEach(node => {
  console.log('CST Node Type:', node.type);
  console.log('CST Node Source:', JSON.stringify(node.source));
  
  // Convert CST back to string
  const reconstructed = CST.stringify(node);
  console.log('Reconstructed:', reconstructed);
});

// Visit CST nodes
CST.visit(cstNodes[0], (token) => {
  console.log(`Visiting token: ${token.type} at offset ${token.offset}`);
});

// Create custom CST tokens
const customScalar = CST.createScalarToken('custom value');
console.log('Custom scalar:', CST.stringify(customScalar));

Advanced Pipeline Usage

Combine all components for custom YAML processing workflows.

import { Lexer, Parser, Composer, LineCounter } from "yaml";

class CustomYAMLProcessor {
  private lexer = new Lexer();
  private lineCounter = new LineCounter();
  private parser = new Parser(this.lineCounter.addNewLine);
  private composer = new Composer({ 
    keepCstNodes: true,
    prettyErrors: true 
  });
  
  async processYAML(source: string) {
    try {
      // Stage 1: Tokenization
      console.log('Stage 1: Tokenizing...');
      const tokens = Array.from(this.lexer.lex(source));
      console.log(`Generated ${tokens.length} tokens`);
      
      // Stage 2: Parsing to CST
      console.log('Stage 2: Parsing to CST...');
      const cstNodes = Array.from(this.parser.parse(source));
      console.log(`Generated ${cstNodes.length} CST nodes`);
      
      // Stage 3: Composition to Documents
      console.log('Stage 3: Composing documents...');
      const documents = Array.from(this.composer.compose(cstNodes));
      console.log(`Generated ${documents.length} documents`);
      
      // Process each document
      const results = documents.map((doc, index) => ({
        index,
        content: doc.toJS(),
        errors: doc.errors.length,
        warnings: doc.warnings.length,
        hasComments: this.hasComments(doc)
      }));
      
      return {
        success: true,
        documents: results,
        totalErrors: results.reduce((sum, doc) => sum + doc.errors, 0),
        totalWarnings: results.reduce((sum, doc) => sum + doc.warnings, 0)
      };
      
    } catch (error) {
      return {
        success: false,
        error: error.message,
        position: error.pos ? this.lineCounter.linePos(error.pos[0]) : null
      };
    }
  }
  
  private hasComments(doc: Document): boolean {
    let hasComments = false;
    
    visit(doc.contents, (key, node) => {
      if (isNode(node) && (node.comment || node.commentBefore)) {
        hasComments = true;
        return visit.BREAK;
      }
    });
    
    return hasComments;
  }
}

// Usage
const processor = new CustomYAMLProcessor();

const complexYAML = `
# Configuration file
app:
  name: MyApp  # Application name
  version: 1.0.0
  
# Database settings  
database:
  host: localhost
  port: 5432
  
# Feature flags
features:
  - auth      # Authentication
  - logging   # Request logging  
  - metrics   # Performance metrics
`;

processor.processYAML(complexYAML).then(result => {
  console.log('Processing result:', result);
});

Pipeline Architecture

The YAML processing pipeline consists of three main stages:

  1. Lexical Analysis (Lexer): Converts source text into tokens
  2. Syntactic Analysis (Parser): Converts tokens into CST nodes
  3. Semantic Analysis (Composer): Converts CST into Document AST

This separation allows for:

  • Custom tokenization logic
  • Syntax tree manipulation
  • Alternative document representations
  • Advanced error handling and recovery
  • Performance optimization for specific use cases

Install with Tessl CLI

npx tessl i tessl/npm-yaml

docs

ast-nodes.md

document-processing.md

error-handling.md

index.md

parse-stringify.md

parser-infrastructure.md

schema-configuration.md

tree-traversal.md

type-guards.md

utilities.md

tile.json