or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

character-sets.mdindex.mdreconstruction.mdtokenization.md
tile.json

tessl/npm-ret

Tokenizes a string that represents a regular expression.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
npmpkg:npm/ret@0.0.x

To install, run

npx @tessl/cli install tessl/npm-ret@0.0.0

index.mddocs/

Ret

Ret is a TypeScript library that tokenizes regular expression strings into structured AST-like representations, providing comprehensive parsing and reconstruction capabilities for regex analysis, transformation, and validation tools.

Package Information

  • Package Name: ret
  • Package Type: npm
  • Language: TypeScript
  • Installation: npm install ret

Core Imports

import ret, { tokenizer, reconstruct, types } from "ret";
// Character sets are not directly exported from main module
// Import them separately if needed:
// import { words, ints, whitespace } from "ret/dist/sets";

For CommonJS:

const ret = require("ret");
const { types, reconstruct } = ret;
// ret is the tokenizer function
// ret.types and ret.reconstruct are also available

// For character set utilities:
const sets = require("ret/dist/sets");
const { words, ints, whitespace, notWords, notInts, notWhitespace, anyChar } = sets;

Basic Usage

import ret, { reconstruct, types } from "ret";

// Tokenize a regular expression
const tokens = ret(/foo|bar/.source);
// or: const tokens = tokenizer(/foo|bar/.source);

// Tokens structure:
// {
//   "type": types.ROOT,
//   "options": [
//     [ { "type": types.CHAR, "value": 102 },  // 'f'
//       { "type": types.CHAR, "value": 111 },  // 'o'
//       { "type": types.CHAR, "value": 111 } ],// 'o'
//     [ { "type": types.CHAR, "value": 98 },   // 'b'
//       { "type": types.CHAR, "value": 97 },   // 'a'
//       { "type": types.CHAR, "value": 114 } ] // 'r'
//   ]
// }

// Reconstruct regex from tokens
const regexString = reconstruct(tokens); // "foo|bar"

// Working with character sets
import { words, ints } from "ret/dist/sets";

const wordToken = words();     // Equivalent to \w
const digitToken = ints();     // Equivalent to \d

reconstruct(wordToken);  // "\\w"
reconstruct(digitToken); // "\\d"

Architecture

Ret is built around several key components:

  • Tokenizer: Core parser that converts regex strings into structured token trees
  • Type System: Comprehensive TypeScript types for all token variants (characters, groups, sets, repetitions, etc.)
  • Reconstruction: Converts token structures back to valid regex strings
  • Character Sets: Predefined character class utilities (digits, words, whitespace, etc.)
  • Utilities: Helper functions for string processing and character class parsing

Capabilities

Regex Tokenization

Core tokenization functionality that parses regular expression strings into structured token representations. Handles all regex features including groups, character classes, quantifiers, and lookarounds.

function tokenizer(regexpStr: string): Root;

Tokenization

Token Reconstruction

Converts token structures back into valid regular expression strings, enabling regex transformation and analysis workflows.

function reconstruct(token: Tokens): string;

Reconstruction

Character Set Utilities

Predefined character class utilities for generating common regex character sets programmatically.

function words(): Set;
function notWords(): Set;
function ints(): Set;
function notInts(): Set;
function whitespace(): Set;
function notWhitespace(): Set;
function anyChar(): Set;

Character Sets

Core Types

enum types {
  ROOT,
  GROUP,
  POSITION,
  SET,
  RANGE,
  REPETITION,
  REFERENCE,
  CHAR
}

interface Root {
  type: types.ROOT;
  stack?: Token[];
  options?: Token[][];
  flags?: string[];
}

interface Group {
  type: types.GROUP;
  stack?: Token[];
  options?: Token[][];
  remember: boolean;
  followedBy?: boolean;
  notFollowedBy?: boolean;
  lookBehind?: boolean;
  name?: string;
}

interface Set {
  type: types.SET;
  set: SetTokens;
  not: boolean;
}

interface Repetition {
  type: types.REPETITION;
  min: number;
  max: number;
  value: Token;
}

interface Position {
  type: types.POSITION;
  value: '$' | '^' | 'b' | 'B';
}

interface Reference {
  type: types.REFERENCE;
  value: number;
}

interface Char {
  type: types.CHAR;
  value: number;
}

interface Range {
  type: types.RANGE;
  from: number;
  to: number;
}

type Token = Group | Position | Set | Range | Repetition | Reference | Char;
type Tokens = Root | Token;
type SetTokens = (Range | Char | Set)[];