or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

core-reading-writing.mddata-reading.mddata-writing.mddialect-detection.mddialects-configuration.mddictionary-operations.mdindex.md
tile.json

tessl/pypi-clevercsv

A Python package for handling messy CSV files with enhanced dialect detection capabilities

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/clevercsv@0.8.x

To install, run

npx @tessl/cli install tessl/pypi-clevercsv@0.8.0

index.mddocs/

CleverCSV

A comprehensive Python library that provides a drop-in replacement for the built-in csv module with enhanced dialect detection capabilities for handling messy and inconsistent CSV files. The package offers advanced pattern recognition algorithms to automatically detect row and type patterns in CSV data, enabling reliable parsing of files that would otherwise cause issues with standard CSV parsers.

Package Information

  • Package Name: clevercsv
  • Language: Python
  • Installation: pip install clevercsv (core) or pip install clevercsv[full] (with CLI tools)

Core Imports

import clevercsv

Drop-in replacement usage:

import clevercsv as csv

Basic Usage

import clevercsv

# Automatic dialect detection and reading
rows = clevercsv.read_table('./data.csv')

# Read as pandas DataFrame (requires pandas)
df = clevercsv.read_dataframe('./data.csv')

# Read as dictionaries (first row as headers)
records = clevercsv.read_dicts('./data.csv')

# Traditional csv-style usage with automatic detection
with open('./data.csv', newline='') as csvfile:
    dialect = clevercsv.Sniffer().sniff(csvfile.read())
    csvfile.seek(0)
    reader = clevercsv.reader(csvfile, dialect)
    rows = list(reader)

# Manual dialect detection
dialect = clevercsv.detect_dialect('./data.csv')
print(f"Detected: {dialect}")

Architecture

CleverCSV employs a multi-stage dialect detection system:

  • Normal Form Detection: First-pass detection using pattern analysis of row lengths and data types
  • Consistency Measure: Fallback detection method using data consistency scoring
  • C Extensions: Optimized parsing engine for performance-critical operations
  • Wrapper Functions: High-level convenience functions for common CSV operations
  • Command Line Interface: Complete CLI toolkit for CSV standardization and analysis

This design enables CleverCSV to achieve 97% accuracy for dialect detection with a 21% improvement on non-standard CSV files compared to Python's standard library.

Capabilities

High-Level Data Reading

Convenient wrapper functions that automatically detect dialects and encodings, providing the easiest way to work with CSV files without manual configuration.

def read_table(filename, dialect=None, encoding=None, num_chars=None, verbose=False) -> List[List[str]]: ...
def read_dicts(filename, dialect=None, encoding=None, num_chars=None, verbose=False) -> List[Dict[str, str]]: ...
def read_dataframe(filename, *args, num_chars=None, **kwargs): ...
def stream_table(filename, dialect=None, encoding=None, num_chars=None, verbose=False) -> Iterator[List[str]]: ...
def stream_dicts(filename, dialect=None, encoding=None, num_chars=None, verbose=False) -> Iterator[Dict[str, str]]: ...

Data Reading

Dialect Detection and Management

Advanced dialect detection capabilities using pattern analysis and consistency measures, with support for custom detection parameters and manual dialect specification.

class Detector:
    def detect(self, sample, delimiters=None, verbose=False, method='auto', skip=True) -> Optional[SimpleDialect]: ...
    def sniff(self, sample, delimiters=None, verbose=False) -> Optional[SimpleDialect]: ...
    def has_header(self, sample, max_rows_to_check=20) -> bool: ...

def detect_dialect(filename, num_chars=None, encoding=None, verbose=False, method='auto', skip=True) -> Optional[SimpleDialect]: ...

Dialect Detection

Core CSV Reading and Writing

Low-level CSV reader and writer classes that provide drop-in compatibility with Python's csv module while supporting CleverCSV's enhanced dialect handling.

class reader:
    def __init__(self, csvfile, dialect='excel', **fmtparams): ...
    def __iter__(self) -> Iterator[List[str]]: ...
    def __next__(self) -> List[str]: ...

class writer:
    def __init__(self, csvfile, dialect='excel', **fmtparams): ...
    def writerow(self, row) -> Any: ...
    def writerows(self, rows) -> Any: ...

Core Reading and Writing

Dictionary-Based CSV Operations

Dictionary-based reading and writing that treats the first row as headers, providing a more convenient interface for structured CSV data.

class DictReader:
    def __init__(self, f, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds): ...
    def __iter__(self) -> Iterator[Dict[str, str]]: ...
    def __next__(self) -> Dict[str, str]: ...

class DictWriter:
    def __init__(self, f, fieldnames, restval='', extrasaction='raise', dialect='excel', *args, **kwds): ...
    def writeheader(self) -> Any: ...
    def writerow(self, rowdict) -> Any: ...
    def writerows(self, rowdicts) -> None: ...

Dictionary Operations

Dialects and Configuration

Dialect classes and configuration utilities for managing CSV parsing parameters, including predefined dialects and custom dialect creation.

class SimpleDialect:
    def __init__(self, delimiter, quotechar, escapechar, strict=False): ...
    def validate(self) -> None: ...
    def to_csv_dialect(self): ...
    def to_dict(self) -> Dict[str, Union[str, bool, None]]: ...

# Predefined dialects
excel: csv.Dialect
excel_tab: csv.Dialect
unix_dialect: csv.Dialect

Dialects and Configuration

Data Writing

High-level function for writing tabular data to CSV files with automatic formatting and RFC-4180 compliance by default.

def write_table(table, filename, dialect='excel', transpose=False, encoding=None) -> None: ...

Data Writing

Types

# Detection results
Optional[SimpleDialect]

# File paths
Union[str, PathLike]

# CSV data structures
List[List[str]]  # Table data
List[Dict[str, str]]  # Dictionary records
Iterator[List[str]]  # Streaming table data
Iterator[Dict[str, str]]  # Streaming dictionary records

# Dialect specifications
Union[str, SimpleDialect, csv.Dialect]

# Detection methods
Literal['auto', 'normal', 'consistency']

Constants

# Quoting constants (from csv module)
QUOTE_ALL: int
QUOTE_MINIMAL: int 
QUOTE_NONE: int
QUOTE_NONNUMERIC: int

Exceptions

class Error(Exception):
    """General CleverCSV error"""

class NoDetectionResult(Exception):
    """Raised when dialect detection fails"""