A command-line tool that analyzes text files with binary headers or metadata to detect their character encoding, skipping non-text content at the beginning of files.

Requirements

Core Functionality

Your tool should accept a file path and optional configuration parameters, then detect and report the character encoding of the text content. The tool must handle files that contain binary headers, metadata, or other non-text content before the actual text data begins.

Command-line Interface

The tool should support the following command-line arguments:

--file <path>: Path to the file to analyze (required)
--skip <bytes>: Number of bytes to skip from the beginning of the file before analyzing (default: 0)
--sample <bytes>: Maximum number of bytes to read for analysis after skipping (optional)

Output Format

The tool should output a JSON object to stdout with the following structure:

{
  "file": "/path/to/file",
  "offset": 0,
  "sampleSize": null,
  "encoding": "UTF-8"
}

Where:

file: The path to the analyzed file
offset: The number of bytes skipped
sampleSize: The number of bytes sampled (or null if entire file was read)
encoding: The detected character encoding (or null if detection failed)

Error Handling

If the file does not exist, output an error message to stderr and exit with code 1
If required arguments are missing, output usage information to stderr and exit with code 1
If encoding detection fails, set encoding to null in the output

Test Cases

Create a file with UTF-8 text content and detect its encoding without skipping any bytes. The detected encoding should be UTF-8. @test
Create a file with a 256-byte binary header followed by UTF-8 text. Use the skip parameter to skip the header and detect the encoding of the text portion. The detected encoding should be UTF-8. @test
Create a file with a 100-byte metadata section followed by ISO-8859-1 encoded text. Skip the metadata and detect the encoding of the text portion. The detected encoding should be ISO-8859-1. @test
Attempt to analyze a non-existent file. The tool should output an error message and exit with code 1. @test

Implementation

@generates

API

/**
 * Analyzes a file to detect its character encoding.
 *
 * @param {string} filePath - Path to the file to analyze
 * @param {Object} options - Analysis options
 * @param {number} [options.offset=0] - Number of bytes to skip from the beginning
 * @param {number} [options.sampleSize] - Maximum number of bytes to read
 * @returns {Promise<Object>} Analysis result with file path, offset, sampleSize, and encoding
 */
async function analyzeFile(filePath, options = {}) {
  // IMPLEMENTATION HERE
}

/**
 * Parses command-line arguments.
 *
 * @param {string[]} args - Command-line arguments
 * @returns {Object} Parsed arguments with file, skip, and sample properties
 * @throws {Error} If required arguments are missing
 */
function parseArguments(args) {
  // IMPLEMENTATION HERE
}

module.exports = {
  analyzeFile,
  parseArguments
};

Dependencies { .dependencies }

chardet { .dependency }

Provides character encoding detection support.

Version

Files

task.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-1/

File Encoding Analyzer