or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-reading-writing.mddata-reading.mddata-writing.mddialect-detection.mddialects-configuration.mddictionary-operations.mdindex.md

index.mddocs/

0

# CleverCSV

1

2

A comprehensive Python library that provides a drop-in replacement for the built-in csv module with enhanced dialect detection capabilities for handling messy and inconsistent CSV files. The package offers advanced pattern recognition algorithms to automatically detect row and type patterns in CSV data, enabling reliable parsing of files that would otherwise cause issues with standard CSV parsers.

3

4

## Package Information

5

6

- **Package Name**: clevercsv

7

- **Language**: Python

8

- **Installation**: `pip install clevercsv` (core) or `pip install clevercsv[full]` (with CLI tools)

9

10

## Core Imports

11

12

```python

13

import clevercsv

14

```

15

16

Drop-in replacement usage:

17

```python

18

import clevercsv as csv

19

```

20

21

## Basic Usage

22

23

```python

24

import clevercsv

25

26

# Automatic dialect detection and reading

27

rows = clevercsv.read_table('./data.csv')

28

29

# Read as pandas DataFrame (requires pandas)

30

df = clevercsv.read_dataframe('./data.csv')

31

32

# Read as dictionaries (first row as headers)

33

records = clevercsv.read_dicts('./data.csv')

34

35

# Traditional csv-style usage with automatic detection

36

with open('./data.csv', newline='') as csvfile:

37

dialect = clevercsv.Sniffer().sniff(csvfile.read())

38

csvfile.seek(0)

39

reader = clevercsv.reader(csvfile, dialect)

40

rows = list(reader)

41

42

# Manual dialect detection

43

dialect = clevercsv.detect_dialect('./data.csv')

44

print(f"Detected: {dialect}")

45

```

46

47

## Architecture

48

49

CleverCSV employs a multi-stage dialect detection system:

50

51

- **Normal Form Detection**: First-pass detection using pattern analysis of row lengths and data types

52

- **Consistency Measure**: Fallback detection method using data consistency scoring

53

- **C Extensions**: Optimized parsing engine for performance-critical operations

54

- **Wrapper Functions**: High-level convenience functions for common CSV operations

55

- **Command Line Interface**: Complete CLI toolkit for CSV standardization and analysis

56

57

This design enables CleverCSV to achieve 97% accuracy for dialect detection with a 21% improvement on non-standard CSV files compared to Python's standard library.

58

59

## Capabilities

60

61

### High-Level Data Reading

62

63

Convenient wrapper functions that automatically detect dialects and encodings, providing the easiest way to work with CSV files without manual configuration.

64

65

```python { .api }

66

def read_table(filename, dialect=None, encoding=None, num_chars=None, verbose=False) -> List[List[str]]: ...

67

def read_dicts(filename, dialect=None, encoding=None, num_chars=None, verbose=False) -> List[Dict[str, str]]: ...

68

def read_dataframe(filename, *args, num_chars=None, **kwargs): ...

69

def stream_table(filename, dialect=None, encoding=None, num_chars=None, verbose=False) -> Iterator[List[str]]: ...

70

def stream_dicts(filename, dialect=None, encoding=None, num_chars=None, verbose=False) -> Iterator[Dict[str, str]]: ...

71

```

72

73

[Data Reading](./data-reading.md)

74

75

### Dialect Detection and Management

76

77

Advanced dialect detection capabilities using pattern analysis and consistency measures, with support for custom detection parameters and manual dialect specification.

78

79

```python { .api }

80

class Detector:

81

def detect(self, sample, delimiters=None, verbose=False, method='auto', skip=True) -> Optional[SimpleDialect]: ...

82

def sniff(self, sample, delimiters=None, verbose=False) -> Optional[SimpleDialect]: ...

83

def has_header(self, sample, max_rows_to_check=20) -> bool: ...

84

85

def detect_dialect(filename, num_chars=None, encoding=None, verbose=False, method='auto', skip=True) -> Optional[SimpleDialect]: ...

86

```

87

88

[Dialect Detection](./dialect-detection.md)

89

90

### Core CSV Reading and Writing

91

92

Low-level CSV reader and writer classes that provide drop-in compatibility with Python's csv module while supporting CleverCSV's enhanced dialect handling.

93

94

```python { .api }

95

class reader:

96

def __init__(self, csvfile, dialect='excel', **fmtparams): ...

97

def __iter__(self) -> Iterator[List[str]]: ...

98

def __next__(self) -> List[str]: ...

99

100

class writer:

101

def __init__(self, csvfile, dialect='excel', **fmtparams): ...

102

def writerow(self, row) -> Any: ...

103

def writerows(self, rows) -> Any: ...

104

```

105

106

[Core Reading and Writing](./core-reading-writing.md)

107

108

### Dictionary-Based CSV Operations

109

110

Dictionary-based reading and writing that treats the first row as headers, providing a more convenient interface for structured CSV data.

111

112

```python { .api }

113

class DictReader:

114

def __init__(self, f, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds): ...

115

def __iter__(self) -> Iterator[Dict[str, str]]: ...

116

def __next__(self) -> Dict[str, str]: ...

117

118

class DictWriter:

119

def __init__(self, f, fieldnames, restval='', extrasaction='raise', dialect='excel', *args, **kwds): ...

120

def writeheader(self) -> Any: ...

121

def writerow(self, rowdict) -> Any: ...

122

def writerows(self, rowdicts) -> None: ...

123

```

124

125

[Dictionary Operations](./dictionary-operations.md)

126

127

### Dialects and Configuration

128

129

Dialect classes and configuration utilities for managing CSV parsing parameters, including predefined dialects and custom dialect creation.

130

131

```python { .api }

132

class SimpleDialect:

133

def __init__(self, delimiter, quotechar, escapechar, strict=False): ...

134

def validate(self) -> None: ...

135

def to_csv_dialect(self): ...

136

def to_dict(self) -> Dict[str, Union[str, bool, None]]: ...

137

138

# Predefined dialects

139

excel: csv.Dialect

140

excel_tab: csv.Dialect

141

unix_dialect: csv.Dialect

142

```

143

144

[Dialects and Configuration](./dialects-configuration.md)

145

146

### Data Writing

147

148

High-level function for writing tabular data to CSV files with automatic formatting and RFC-4180 compliance by default.

149

150

```python { .api }

151

def write_table(table, filename, dialect='excel', transpose=False, encoding=None) -> None: ...

152

```

153

154

[Data Writing](./data-writing.md)

155

156

## Types

157

158

```python { .api }

159

# Detection results

160

Optional[SimpleDialect]

161

162

# File paths

163

Union[str, PathLike]

164

165

# CSV data structures

166

List[List[str]] # Table data

167

List[Dict[str, str]] # Dictionary records

168

Iterator[List[str]] # Streaming table data

169

Iterator[Dict[str, str]] # Streaming dictionary records

170

171

# Dialect specifications

172

Union[str, SimpleDialect, csv.Dialect]

173

174

# Detection methods

175

Literal['auto', 'normal', 'consistency']

176

```

177

178

## Constants

179

180

```python { .api }

181

# Quoting constants (from csv module)

182

QUOTE_ALL: int

183

QUOTE_MINIMAL: int

184

QUOTE_NONE: int

185

QUOTE_NONNUMERIC: int

186

```

187

188

## Exceptions

189

190

```python { .api }

191

class Error(Exception):

192

"""General CleverCSV error"""

193

194

class NoDetectionResult(Exception):

195

"""Raised when dialect detection fails"""

196

```