CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-ret

Tokenizes a string that represents a regular expression.

Pending
Overview
Eval results
Files

reconstruction.mddocs/

Token Reconstruction

Converts token structures back into valid regular expression strings, enabling regex transformation and analysis workflows. The reconstruction process handles proper escaping and formatting for all token types.

Capabilities

Reconstruct Function

Converts any token (or token tree) back into its regular expression string representation.

/**
 * Reconstructs a regular expression string from a token structure
 * @param token - Any token (Root, Group, Character, etc.) to reconstruct
 * @returns String representation of the regex component
 * @throws Error for invalid token types
 */
function reconstruct(token: Tokens): string;

Usage Examples:

import { tokenizer, reconstruct, types } from "ret";

// Reconstruct entire regex
const tokens = tokenizer("foo|bar");
const regex = reconstruct(tokens); // "foo|bar"

// Reconstruct individual tokens
const charToken = { type: types.CHAR, value: 102 };
reconstruct(charToken); // "f"

// Reconstruct complex structures
const setToken = {
  type: types.SET,
  set: [
    { type: types.CHAR, value: 97 },  // 'a'
    { type: types.CHAR, value: 98 },  // 'b'
    { type: types.CHAR, value: 99 }   // 'c'
  ],
  not: true
};
reconstruct(setToken); // "[^abc]"

// Reconstruct groups
const groupToken = {
  type: types.GROUP,
  remember: false,
  stack: [
    { type: types.CHAR, value: 97 },  // 'a'
    { type: types.CHAR, value: 98 }   // 'b'
  ]
};
reconstruct(groupToken); // "(?:ab)"

Reconstruction Rules

Character Reconstruction

Characters are converted to their string representation with proper escaping:

// Special regex characters are escaped
reconstruct({ type: types.CHAR, value: 42 });  // "\\*" (asterisk)
reconstruct({ type: types.CHAR, value: 46 });  // "\\." (dot)
reconstruct({ type: types.CHAR, value: 91 });  // "\\[" (left bracket)

// Regular characters remain unescaped
reconstruct({ type: types.CHAR, value: 97 });  // "a"
reconstruct({ type: types.CHAR, value: 49 });  // "1"

Position Reconstruction

Position tokens represent anchors and boundaries:

reconstruct({ type: types.POSITION, value: "^" }); // "^"
reconstruct({ type: types.POSITION, value: "$" }); // "$"
reconstruct({ type: types.POSITION, value: "b" }); // "\\b"
reconstruct({ type: types.POSITION, value: "B" }); // "\\B"

Reference Reconstruction

Backreferences are formatted with backslash prefix:

reconstruct({ type: types.REFERENCE, value: 1 }); // "\\1"
reconstruct({ type: types.REFERENCE, value: 9 }); // "\\9"

Set Reconstruction

Character sets are reconstructed with proper bracket notation:

// Regular character set
const regularSet = {
  type: types.SET,
  set: [{ type: types.CHAR, value: 97 }],
  not: false
};
reconstruct(regularSet); // "[a]"

// Negated character set
const negatedSet = {
  type: types.SET,
  set: [{ type: types.CHAR, value: 97 }],
  not: true
};
reconstruct(negatedSet); // "[^a]"

// Character range
const rangeSet = {
  type: types.SET,
  set: [{ type: types.RANGE, from: 97, to: 122 }],
  not: false
};
reconstruct(rangeSet); // "[a-z]"

Group Reconstruction

Groups are reconstructed with appropriate modifiers:

// Capturing group
const capturingGroup = {
  type: types.GROUP,
  remember: true,
  stack: [{ type: types.CHAR, value: 97 }]
};
reconstruct(capturingGroup); // "(a)"

// Non-capturing group
const nonCapturingGroup = {
  type: types.GROUP,
  remember: false,
  stack: [{ type: types.CHAR, value: 97 }]
};
reconstruct(nonCapturingGroup); // "(?:a)"

// Named group
const namedGroup = {
  type: types.GROUP,
  remember: true,
  name: "mygroup",
  stack: [{ type: types.CHAR, value: 97 }]
};
reconstruct(namedGroup); // "(?<mygroup>a)"

// Positive lookahead
const lookahead = {
  type: types.GROUP,
  remember: false,
  followedBy: true,
  stack: [{ type: types.CHAR, value: 97 }]
};
reconstruct(lookahead); // "(?=a)"

// Negative lookahead
const negativeLookahead = {
  type: types.GROUP,
  remember: false,
  notFollowedBy: true,
  stack: [{ type: types.CHAR, value: 97 }]
};
reconstruct(negativeLookahead); // "(?!a)"

Repetition Reconstruction

Quantifiers are reconstructed in their appropriate forms:

// Optional (0 or 1)
const optional = {
  type: types.REPETITION,
  min: 0,
  max: 1,
  value: { type: types.CHAR, value: 97 }
};
reconstruct(optional); // "a?"

// One or more
const oneOrMore = {
  type: types.REPETITION,
  min: 1,
  max: Infinity,
  value: { type: types.CHAR, value: 97 }
};
reconstruct(oneOrMore); // "a+"

// Zero or more
const zeroOrMore = {
  type: types.REPETITION,
  min: 0,
  max: Infinity,
  value: { type: types.CHAR, value: 97 }
};
reconstruct(zeroOrMore); // "a*"

// Exact count
const exact = {
  type: types.REPETITION,
  min: 3,
  max: 3,
  value: { type: types.CHAR, value: 97 }
};
reconstruct(exact); // "a{3}"

// Range
const range = {
  type: types.REPETITION,
  min: 2,
  max: 5,
  value: { type: types.CHAR, value: 97 }
};
reconstruct(range); // "a{2,5}"

// Minimum with no maximum
const minimum = {
  type: types.REPETITION,
  min: 2,
  max: Infinity,
  value: { type: types.CHAR, value: 97 }
};
reconstruct(minimum); // "a{2,}"

Root Reconstruction

Root tokens handle alternation and sequential patterns:

// Sequential pattern
const sequential = {
  type: types.ROOT,
  stack: [
    { type: types.CHAR, value: 97 },  // 'a'
    { type: types.CHAR, value: 98 }   // 'b'
  ]
};
reconstruct(sequential); // "ab"

// Alternation pattern
const alternation = {
  type: types.ROOT,
  options: [
    [{ type: types.CHAR, value: 97 }], // 'a'
    [{ type: types.CHAR, value: 98 }]  // 'b'
  ]
};
reconstruct(alternation); // "a|b"

Common Use Cases

Regex Transformation

import { tokenizer, reconstruct } from "ret";

// Parse, modify, and reconstruct
const tokens = tokenizer("a+");
const repetition = tokens.stack[0] as any;
repetition.min = 2; // Change from 1+ to 2+
repetition.max = 4; // Change to exactly 2-4
const modified = reconstruct(tokens); // "a{2,4}"

Regex Analysis

import { tokenizer, reconstruct } from "ret";

function extractGroups(regexStr: string): string[] {
  const tokens = tokenizer(regexStr);
  const groups: string[] = [];
  
  function walkTokens(token: any) {
    if (token.type === types.GROUP && token.remember) {
      groups.push(reconstruct(token));
    }
    if (token.stack) {
      token.stack.forEach(walkTokens);
    }
    if (token.options) {
      token.options.forEach((option: any) => option.forEach(walkTokens));
    }
  }
  
  walkTokens(tokens);
  return groups;
}

const groups = extractGroups("(foo)|(bar)"); // ["(foo)", "(bar)"]

Install with Tessl CLI

npx tessl i tessl/npm-ret

docs

character-sets.md

index.md

reconstruction.md

tokenization.md

tile.json