docs
evals
scenario-1
scenario-10
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
A command-line tool that analyzes text files with binary headers or metadata to detect their character encoding, skipping non-text content at the beginning of files.
Your tool should accept a file path and optional configuration parameters, then detect and report the character encoding of the text content. The tool must handle files that contain binary headers, metadata, or other non-text content before the actual text data begins.
The tool should support the following command-line arguments:
--file <path>: Path to the file to analyze (required)--skip <bytes>: Number of bytes to skip from the beginning of the file before analyzing (default: 0)--sample <bytes>: Maximum number of bytes to read for analysis after skipping (optional)The tool should output a JSON object to stdout with the following structure:
{
"file": "/path/to/file",
"offset": 0,
"sampleSize": null,
"encoding": "UTF-8"
}Where:
file: The path to the analyzed fileoffset: The number of bytes skippedsampleSize: The number of bytes sampled (or null if entire file was read)encoding: The detected character encoding (or null if detection failed)encoding to null in the outputCreate a file with UTF-8 text content and detect its encoding without skipping any bytes. The detected encoding should be UTF-8. @test
Create a file with a 256-byte binary header followed by UTF-8 text. Use the skip parameter to skip the header and detect the encoding of the text portion. The detected encoding should be UTF-8. @test
Create a file with a 100-byte metadata section followed by ISO-8859-1 encoded text. Skip the metadata and detect the encoding of the text portion. The detected encoding should be ISO-8859-1. @test
Attempt to analyze a non-existent file. The tool should output an error message and exit with code 1. @test
/**
* Analyzes a file to detect its character encoding.
*
* @param {string} filePath - Path to the file to analyze
* @param {Object} options - Analysis options
* @param {number} [options.offset=0] - Number of bytes to skip from the beginning
* @param {number} [options.sampleSize] - Maximum number of bytes to read
* @returns {Promise<Object>} Analysis result with file path, offset, sampleSize, and encoding
*/
async function analyzeFile(filePath, options = {}) {
// IMPLEMENTATION HERE
}
/**
* Parses command-line arguments.
*
* @param {string[]} args - Command-line arguments
* @returns {Object} Parsed arguments with file, skip, and sample properties
* @throws {Error} If required arguments are missing
*/
function parseArguments(args) {
// IMPLEMENTATION HERE
}
module.exports = {
analyzeFile,
parseArguments
};Provides character encoding detection support.