docs
evals
scenario-1
scenario-10
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
Build a command-line tool that analyzes text files and provides detailed language and encoding information. The tool should be able to identify both the character encoding and the specific language used in text files, particularly useful for processing documents from various international sources.
The tool should accept a file path as a command-line argument and output a detailed analysis report that includes:
French ISO-8859-1 text: Given a file containing French text encoded in ISO-8859-1, the tool should detect ISO-8859-1 as the primary encoding and identify 'fr' (French) as the language @test
Cyrillic text analysis: Given a file containing Russian text in KOI8-R encoding, the tool should detect KOI8-R and identify 'ru' (Russian) as the language @test
Multi-encoding results: Given a file that could match multiple encodings, the tool should list all matches with confidence scores above 50, including language information when available @test
UTF-8 without language: Given a file in UTF-8 encoding, the tool should correctly identify UTF-8 and handle cases where no specific language is detected @test
/**
* Analyzes a text file and returns encoding and language information
* @param filepath - Path to the file to analyze
* @returns Object containing encoding, language, and alternative matches
*/
export interface AnalysisResult {
primaryEncoding: string | null;
language?: string;
confidence: number;
alternatives: Array<{
encoding: string;
confidence: number;
language?: string;
}>;
}
export function analyzeFile(filepath: string): Promise<AnalysisResult>;
/**
* Formats and displays the analysis results
* @param result - The analysis result to display
*/
export function displayResults(result: AnalysisResult): void;
/**
* Main CLI entry point
*/
export function main(args: string[]): Promise<void>;Provides character encoding and language detection capabilities.