or run

npx @tessl/cli init
Log in

Version

Files

docs

index.md
tile.json

task.mdevals/scenario-10/

Legacy Email Parser

Build a parser for legacy email archives that can identify and handle ISO-2022 encoded text content.

Problem

You are working with archived email messages from legacy systems that use ISO-2022 family encodings (ISO-2022-JP, ISO-2022-KR, ISO-2022-CN). These encodings use escape sequences to switch between character sets within a single document.

Your task is to build a utility that can:

  1. Detect whether email content uses ISO-2022 encoding
  2. Differentiate between ISO-2022-JP, ISO-2022-KR, and ISO-2022-CN variants
  3. Extract confidence scores for the encoding detection

Requirements

Create a module that exports the following functions:

  1. detectISO2022(buffer) - Returns the detected ISO-2022 encoding name if found, or null otherwise
  2. analyzeISO2022Confidence(buffer) - Returns an object with encoding names and their confidence scores for all ISO-2022 variants detected

Function Specifications

detectISO2022(buffer)

  • Input: A Buffer or Uint8Array containing the raw email bytes
  • Output: A string with the encoding name (e.g., 'ISO-2022-JP') or null if no ISO-2022 encoding detected
  • Should return only ISO-2022 family encodings, not other encodings

analyzeISO2022Confidence(buffer)

  • Input: A Buffer or Uint8Array containing the raw email bytes
  • Output: An array of objects, where each object has:
    • name: The encoding name
    • confidence: A number between 0-100
    • lang: Optional language code
  • Should return only ISO-2022 variants in the results
  • Results should be sorted by confidence in descending order

Test Cases

  • When given a Buffer containing ISO-2022-JP encoded text with escape sequences, detectISO2022 returns 'ISO-2022-JP' @test
  • When given a Buffer containing UTF-8 encoded text, detectISO2022 returns null @test
  • When given a Buffer containing ISO-2022-KR encoded text, analyzeISO2022Confidence returns results including ISO-2022-KR with a confidence score @test
  • When given a Buffer containing mixed ISO-2022 variants, analyzeISO2022Confidence returns multiple ISO-2022 results sorted by confidence @test

Implementation

@generates

API

/**
 * Detects if the input buffer contains ISO-2022 encoded text
 * @param {Buffer|Uint8Array} buffer - The raw bytes to analyze
 * @returns {string|null} The ISO-2022 encoding name or null
 */
function detectISO2022(buffer) {
  // IMPLEMENTATION HERE
}

/**
 * Analyzes confidence scores for ISO-2022 encoding variants
 * @param {Buffer|Uint8Array} buffer - The raw bytes to analyze
 * @returns {Array<{name: string, confidence: number, lang?: string}>} ISO-2022 variants with confidence scores
 */
function analyzeISO2022Confidence(buffer) {
  // IMPLEMENTATION HERE
}

module.exports = {
  detectISO2022,
  analyzeISO2022Confidence
};

Dependencies { .dependencies }

chardet { .dependency }

Provides character encoding detection with ISO-2022 escape sequence recognition support.