or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

index.md
tile.json

tessl/npm-js-tokens

Tiny JavaScript tokenizer that never fails and is almost spec-compliant

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
npmpkg:npm/js-tokens@9.0.x

To install, run

npx @tessl/cli install tessl/npm-js-tokens@9.0.0

index.mddocs/

js-tokens

js-tokens is a tiny, regex-powered, lenient JavaScript tokenizer that never fails and is almost spec-compliant. It provides a generator function that turns JavaScript code strings into token objects, making it perfect for syntax highlighting, code formatting, linters, and any application requiring reliable JavaScript tokenization.

Package Information

  • Package Name: js-tokens
  • Package Type: npm
  • Language: JavaScript (TypeScript definitions included)
  • Installation: npm install js-tokens

Core Imports

const jsTokens = require("js-tokens");

For ES modules:

import jsTokens from "js-tokens";

Basic Usage

const jsTokens = require("js-tokens");

// Basic tokenization
const code = 'JSON.stringify({k:3.14**2}, null /*replacer*/, "\\t")';
const tokens = Array.from(jsTokens(code));

// Extract token values
const tokenValues = tokens.map(token => token.value);
console.log(tokenValues.join("|"));
// Output: JSON|.|stringify|(|{|k|:|3.14|**|2|}|,| |null| |/*replacer*/|,| |"\t"|)

// Loop over tokens
for (const token of jsTokens("hello, !world")) {
  console.log(`${token.type}: ${token.value}`);
}

// JSX tokenization
const jsxCode = '<div>Hello {"world"}!</div>';
const jsxTokens = Array.from(jsTokens(jsxCode, { jsx: true }));

Architecture

js-tokens is built around a single core function with the following key characteristics:

  • Never fails: Always returns tokens even for invalid JavaScript, never throws errors
  • Lenient parsing: Handles incomplete/malformed code gracefully
  • Context-aware: Differentiates between regex and division operators based on preceding tokens
  • Regex-powered: Uses optimized regular expressions for fast tokenization
  • Position-preserving: Token values can be concatenated to reconstruct the original input
  • ECMAScript compliant: Nearly fully ECMAScript 2024 compliant with minimal shortcuts

Capabilities

JavaScript Tokenization

Core tokenization function that converts JavaScript code strings into detailed token objects with comprehensive type information.

/**
 * Tokenizes JavaScript code into an iterable of token objects
 * @param input - JavaScript code string to tokenize
 * @param options - Optional configuration object
 * @returns Iterable of Token objects for regular JavaScript
 */
function jsTokens(input: string, options?: { jsx?: boolean }): Iterable<Token>;

/**
 * Tokenizes JavaScript code with JSX support
 * @param input - JavaScript/JSX code string to tokenize  
 * @param options - Configuration object with jsx: true
 * @returns Iterable of Token and JSXToken objects
 */
function jsTokens(
  input: string, 
  options: { jsx: true }
): Iterable<Token | JSXToken>;

Standard JavaScript Tokens

js-tokens recognizes 17 different token types for standard JavaScript code:

type Token =
  | { type: "StringLiteral"; value: string; closed: boolean }
  | { type: "NoSubstitutionTemplate"; value: string; closed: boolean }
  | { type: "TemplateHead"; value: string }
  | { type: "TemplateMiddle"; value: string }
  | { type: "TemplateTail"; value: string; closed: boolean }
  | { type: "RegularExpressionLiteral"; value: string; closed: boolean }
  | { type: "MultiLineComment"; value: string; closed: boolean }
  | { type: "SingleLineComment"; value: string }
  | { type: "HashbangComment"; value: string }
  | { type: "IdentifierName"; value: string }
  | { type: "PrivateIdentifier"; value: string }
  | { type: "NumericLiteral"; value: string }
  | { type: "Punctuator"; value: string }
  | { type: "WhiteSpace"; value: string }
  | { type: "LineTerminatorSequence"; value: string }
  | { type: "Invalid"; value: string };

Key Token Properties:

  • type: Token classification (one of the 17 standard types)
  • value: The actual text content of the token
  • closed: Boolean property on certain tokens (StringLiteral, NoSubstitutionTemplate, TemplateTail, RegularExpressionLiteral, MultiLineComment, JSXString) indicating if they are properly terminated

JSX Tokens

When JSX mode is enabled ({ jsx: true }), js-tokens additionally recognizes 5 JSX-specific token types:

type JSXToken =
  | { type: "JSXString"; value: string; closed: boolean }
  | { type: "JSXText"; value: string }
  | { type: "JSXIdentifier"; value: string }
  | { type: "JSXPunctuator"; value: string }
  | { type: "JSXInvalid"; value: string };

JSX Mode Behavior:

  • Returns mixed Token and JSXToken objects as appropriate
  • JSX runs can also contain WhiteSpace, LineTerminatorSequence, MultiLineComment, and SingleLineComment tokens
  • Switches between outputting runs of Token and runs of JSXToken based on context

Error Handling

js-tokens never throws errors and always produces meaningful output:

  • Invalid JavaScript: Produces "Invalid" tokens for unrecognized characters
  • Incomplete tokens: Uses closed: false property to indicate incomplete strings, templates, regex, etc.
  • JSX errors: Produces "JSXInvalid" tokens when JSX mode encounters invalid characters
  • Extreme inputs: May fail with regex engine limits, but handles normal code gracefully

Example with incomplete tokens:

const tokens = Array.from(jsTokens('"unclosed string\n'));
// Produces: { type: "StringLiteral", value: '"unclosed string', closed: false }

const regexTokens = Array.from(jsTokens('/unclosed regex\n'));  
// Produces: { type: "RegularExpressionLiteral", value: '/unclosed regex', closed: false }

Types

Options Configuration

interface TokenizeOptions {
  /** Enable JSX support (default: false) */
  jsx?: boolean;
}

Token Base Properties

All tokens include these base properties:

interface BaseToken {
  /** Token type classification */
  type: string;
  /** Original text content of the token */
  value: string;
}

Closed Property Tokens

Tokens that can be incomplete include a closed property:

interface ClosedToken extends BaseToken {
  /** Whether the token is properly closed/terminated */
  closed: boolean;
}

Tokens with closed property:

  • StringLiteral
  • NoSubstitutionTemplate
  • TemplateTail
  • RegularExpressionLiteral
  • MultiLineComment
  • JSXString

Token Examples:

// Closed string: { type: "StringLiteral", value: '"hello"', closed: true }
// Unclosed string: { type: "StringLiteral", value: '"hello', closed: false }
// Closed regex: { type: "RegularExpressionLiteral", value: '/abc/g', closed: true }
// Unclosed regex: { type: "RegularExpressionLiteral", value: '/abc', closed: false }