A Python package for handling messy CSV files with enhanced dialect detection capabilities
npx @tessl/cli install tessl/pypi-clevercsv@0.8.00
# CleverCSV
1
2
A comprehensive Python library that provides a drop-in replacement for the built-in csv module with enhanced dialect detection capabilities for handling messy and inconsistent CSV files. The package offers advanced pattern recognition algorithms to automatically detect row and type patterns in CSV data, enabling reliable parsing of files that would otherwise cause issues with standard CSV parsers.
3
4
## Package Information
5
6
- **Package Name**: clevercsv
7
- **Language**: Python
8
- **Installation**: `pip install clevercsv` (core) or `pip install clevercsv[full]` (with CLI tools)
9
10
## Core Imports
11
12
```python
13
import clevercsv
14
```
15
16
Drop-in replacement usage:
17
```python
18
import clevercsv as csv
19
```
20
21
## Basic Usage
22
23
```python
24
import clevercsv
25
26
# Automatic dialect detection and reading
27
rows = clevercsv.read_table('./data.csv')
28
29
# Read as pandas DataFrame (requires pandas)
30
df = clevercsv.read_dataframe('./data.csv')
31
32
# Read as dictionaries (first row as headers)
33
records = clevercsv.read_dicts('./data.csv')
34
35
# Traditional csv-style usage with automatic detection
36
with open('./data.csv', newline='') as csvfile:
37
dialect = clevercsv.Sniffer().sniff(csvfile.read())
38
csvfile.seek(0)
39
reader = clevercsv.reader(csvfile, dialect)
40
rows = list(reader)
41
42
# Manual dialect detection
43
dialect = clevercsv.detect_dialect('./data.csv')
44
print(f"Detected: {dialect}")
45
```
46
47
## Architecture
48
49
CleverCSV employs a multi-stage dialect detection system:
50
51
- **Normal Form Detection**: First-pass detection using pattern analysis of row lengths and data types
52
- **Consistency Measure**: Fallback detection method using data consistency scoring
53
- **C Extensions**: Optimized parsing engine for performance-critical operations
54
- **Wrapper Functions**: High-level convenience functions for common CSV operations
55
- **Command Line Interface**: Complete CLI toolkit for CSV standardization and analysis
56
57
This design enables CleverCSV to achieve 97% accuracy for dialect detection with a 21% improvement on non-standard CSV files compared to Python's standard library.
58
59
## Capabilities
60
61
### High-Level Data Reading
62
63
Convenient wrapper functions that automatically detect dialects and encodings, providing the easiest way to work with CSV files without manual configuration.
64
65
```python { .api }
66
def read_table(filename, dialect=None, encoding=None, num_chars=None, verbose=False) -> List[List[str]]: ...
67
def read_dicts(filename, dialect=None, encoding=None, num_chars=None, verbose=False) -> List[Dict[str, str]]: ...
68
def read_dataframe(filename, *args, num_chars=None, **kwargs): ...
69
def stream_table(filename, dialect=None, encoding=None, num_chars=None, verbose=False) -> Iterator[List[str]]: ...
70
def stream_dicts(filename, dialect=None, encoding=None, num_chars=None, verbose=False) -> Iterator[Dict[str, str]]: ...
71
```
72
73
[Data Reading](./data-reading.md)
74
75
### Dialect Detection and Management
76
77
Advanced dialect detection capabilities using pattern analysis and consistency measures, with support for custom detection parameters and manual dialect specification.
78
79
```python { .api }
80
class Detector:
81
def detect(self, sample, delimiters=None, verbose=False, method='auto', skip=True) -> Optional[SimpleDialect]: ...
82
def sniff(self, sample, delimiters=None, verbose=False) -> Optional[SimpleDialect]: ...
83
def has_header(self, sample, max_rows_to_check=20) -> bool: ...
84
85
def detect_dialect(filename, num_chars=None, encoding=None, verbose=False, method='auto', skip=True) -> Optional[SimpleDialect]: ...
86
```
87
88
[Dialect Detection](./dialect-detection.md)
89
90
### Core CSV Reading and Writing
91
92
Low-level CSV reader and writer classes that provide drop-in compatibility with Python's csv module while supporting CleverCSV's enhanced dialect handling.
93
94
```python { .api }
95
class reader:
96
def __init__(self, csvfile, dialect='excel', **fmtparams): ...
97
def __iter__(self) -> Iterator[List[str]]: ...
98
def __next__(self) -> List[str]: ...
99
100
class writer:
101
def __init__(self, csvfile, dialect='excel', **fmtparams): ...
102
def writerow(self, row) -> Any: ...
103
def writerows(self, rows) -> Any: ...
104
```
105
106
[Core Reading and Writing](./core-reading-writing.md)
107
108
### Dictionary-Based CSV Operations
109
110
Dictionary-based reading and writing that treats the first row as headers, providing a more convenient interface for structured CSV data.
111
112
```python { .api }
113
class DictReader:
114
def __init__(self, f, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds): ...
115
def __iter__(self) -> Iterator[Dict[str, str]]: ...
116
def __next__(self) -> Dict[str, str]: ...
117
118
class DictWriter:
119
def __init__(self, f, fieldnames, restval='', extrasaction='raise', dialect='excel', *args, **kwds): ...
120
def writeheader(self) -> Any: ...
121
def writerow(self, rowdict) -> Any: ...
122
def writerows(self, rowdicts) -> None: ...
123
```
124
125
[Dictionary Operations](./dictionary-operations.md)
126
127
### Dialects and Configuration
128
129
Dialect classes and configuration utilities for managing CSV parsing parameters, including predefined dialects and custom dialect creation.
130
131
```python { .api }
132
class SimpleDialect:
133
def __init__(self, delimiter, quotechar, escapechar, strict=False): ...
134
def validate(self) -> None: ...
135
def to_csv_dialect(self): ...
136
def to_dict(self) -> Dict[str, Union[str, bool, None]]: ...
137
138
# Predefined dialects
139
excel: csv.Dialect
140
excel_tab: csv.Dialect
141
unix_dialect: csv.Dialect
142
```
143
144
[Dialects and Configuration](./dialects-configuration.md)
145
146
### Data Writing
147
148
High-level function for writing tabular data to CSV files with automatic formatting and RFC-4180 compliance by default.
149
150
```python { .api }
151
def write_table(table, filename, dialect='excel', transpose=False, encoding=None) -> None: ...
152
```
153
154
[Data Writing](./data-writing.md)
155
156
## Types
157
158
```python { .api }
159
# Detection results
160
Optional[SimpleDialect]
161
162
# File paths
163
Union[str, PathLike]
164
165
# CSV data structures
166
List[List[str]] # Table data
167
List[Dict[str, str]] # Dictionary records
168
Iterator[List[str]] # Streaming table data
169
Iterator[Dict[str, str]] # Streaming dictionary records
170
171
# Dialect specifications
172
Union[str, SimpleDialect, csv.Dialect]
173
174
# Detection methods
175
Literal['auto', 'normal', 'consistency']
176
```
177
178
## Constants
179
180
```python { .api }
181
# Quoting constants (from csv module)
182
QUOTE_ALL: int
183
QUOTE_MINIMAL: int
184
QUOTE_NONE: int
185
QUOTE_NONNUMERIC: int
186
```
187
188
## Exceptions
189
190
```python { .api }
191
class Error(Exception):
192
"""General CleverCSV error"""
193
194
class NoDetectionResult(Exception):
195
"""Raised when dialect detection fails"""
196
```