or run

npx @tessl/cli init
Log in

Version

Files

docs

index.md
tile.json

task.mdevals/scenario-7/

Multilingual Text Analyzer

Build a command-line tool that analyzes text files and provides detailed language and encoding information. The tool should be able to identify both the character encoding and the specific language used in text files, particularly useful for processing documents from various international sources.

Functionality

The tool should accept a file path as a command-line argument and output a detailed analysis report that includes:

  1. Primary encoding: The most likely character encoding of the file
  2. Alternative encodings: Other possible encodings with their confidence scores (if confidence > 50)
  3. Language identification: The specific language detected within the encoding (e.g., French in ISO-8859-1, Russian in KOI8-R)
  4. Confidence metrics: Numerical confidence scores for all detections

Requirements

  • Process files provided via command-line argument (first argument after script name)
  • Display results in a clear, formatted output showing:
    • Primary encoding name
    • Detected language (if available)
    • Alternative encodings with confidence scores above 50
  • Handle cases where language information is not available
  • Exit gracefully with appropriate error messages for invalid inputs

Test Cases

  • French ISO-8859-1 text: Given a file containing French text encoded in ISO-8859-1, the tool should detect ISO-8859-1 as the primary encoding and identify 'fr' (French) as the language @test

  • Cyrillic text analysis: Given a file containing Russian text in KOI8-R encoding, the tool should detect KOI8-R and identify 'ru' (Russian) as the language @test

  • Multi-encoding results: Given a file that could match multiple encodings, the tool should list all matches with confidence scores above 50, including language information when available @test

  • UTF-8 without language: Given a file in UTF-8 encoding, the tool should correctly identify UTF-8 and handle cases where no specific language is detected @test

Implementation

@generates

API

/**
 * Analyzes a text file and returns encoding and language information
 * @param filepath - Path to the file to analyze
 * @returns Object containing encoding, language, and alternative matches
 */
export interface AnalysisResult {
  primaryEncoding: string | null;
  language?: string;
  confidence: number;
  alternatives: Array<{
    encoding: string;
    confidence: number;
    language?: string;
  }>;
}

export function analyzeFile(filepath: string): Promise<AnalysisResult>;

/**
 * Formats and displays the analysis results
 * @param result - The analysis result to display
 */
export function displayResults(result: AnalysisResult): void;

/**
 * Main CLI entry point
 */
export function main(args: string[]): Promise<void>;

Dependencies { .dependencies }

chardet { .dependency }

Provides character encoding and language detection capabilities.