Tessl Tile for pypi/clevercsv@0.8.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

core-reading-writing.md data-reading.md data-writing.md dialect-detection.md dialects-configuration.md dictionary-operations.md index.md

data-reading.mddocs/

0
# Data Reading
1

2
High-level wrapper functions that automatically detect CSV dialects and encodings, providing the most convenient way to read CSV files without manual configuration. These functions handle the complexity of dialect detection and provide clean interfaces for common CSV reading tasks.
3

4
## Capabilities
5

6
### Table Reading
7

8
Read CSV files as lists of lists (rows), with automatic dialect detection and optional streaming for large files.
9

10
```python { .api }
11
def read_table(
12
    filename: Union[str, PathLike],
13
    dialect: Optional[Union[str, SimpleDialect, csv.Dialect]] = None,
14
    encoding: Optional[str] = None,
15
    num_chars: Optional[int] = None,
16
    verbose: bool = False
17
) -> List[List[str]]:
18
    """
19
    Read a CSV file as a table (list of lists).
20
    
21
    Parameters:
22
    - filename: Path to the CSV file
23
    - dialect: Dialect to use (auto-detected if None)
24
    - encoding: File encoding (auto-detected if None)
25
    - num_chars: Number of characters for detection (entire file if None)
26
    - verbose: Show detection progress
27
    
28
    Returns:
29
    List of rows, each row is a list of string fields
30
    
31
    Raises:
32
    NoDetectionResult: When dialect detection fails
33
    """
34

35
def stream_table(
36
    filename: Union[str, PathLike],
37
    dialect: Optional[Union[str, SimpleDialect, csv.Dialect]] = None,
38
    encoding: Optional[str] = None,
39
    num_chars: Optional[int] = None,
40
    verbose: bool = False
41
) -> Iterator[List[str]]:
42
    """
43
    Read a CSV file as a generator over rows.
44
    
45
    Parameters:
46
    - filename: Path to the CSV file
47
    - dialect: Dialect to use (auto-detected if None)
48
    - encoding: File encoding (auto-detected if None)
49
    - num_chars: Number of characters for detection (entire file if None)
50
    - verbose: Show detection progress
51
    
52
    Returns:
53
    Generator yielding rows as lists of strings
54
    
55
    Raises:
56
    NoDetectionResult: When dialect detection fails
57
    """
58
```
59

60
#### Usage Examples
61

62
```python
63
import clevercsv
64

65
# Read entire CSV file
66
rows = clevercsv.read_table('data.csv')
67
print(f"Read {len(rows)} rows")
68

69
# Read with specific encoding
70
rows = clevercsv.read_table('data.csv', encoding='utf-8')
71

72
# Stream large files to avoid memory issues
73
for row in clevercsv.stream_table('large_file.csv'):
74
    process_row(row)
75

76
# Use specific dialect
77
dialect = clevercsv.SimpleDialect(',', '"', '')
78
rows = clevercsv.read_table('data.csv', dialect=dialect)
79
```
80

81
### Dictionary Reading
82

83
Read CSV files as lists of dictionaries, treating the first row as column headers. Ideal for structured data where you need named access to fields.
84

85
```python { .api }
86
def read_dicts(
87
    filename: Union[str, PathLike],
88
    dialect: Optional[Union[str, SimpleDialect, csv.Dialect]] = None,
89
    encoding: Optional[str] = None,
90
    num_chars: Optional[int] = None,
91
    verbose: bool = False
92
) -> List[Dict[str, str]]:
93
    """
94
    Read a CSV file as a list of dictionaries.
95
    
96
    Parameters:
97
    - filename: Path to the CSV file
98
    - dialect: Dialect to use (auto-detected if None)
99
    - encoding: File encoding (auto-detected if None)
100
    - num_chars: Number of characters for detection (entire file if None)
101
    - verbose: Show detection progress
102
    
103
    Returns:
104
    List of dictionaries where keys are column names from first row
105
    
106
    Raises:
107
    NoDetectionResult: When dialect detection fails
108
    """
109

110
def stream_dicts(
111
    filename: Union[str, PathLike],
112
    dialect: Optional[Union[str, SimpleDialect, csv.Dialect]] = None,
113
    encoding: Optional[str] = None,
114
    num_chars: Optional[int] = None,
115
    verbose: bool = False
116
) -> Iterator[Dict[str, str]]:
117
    """
118
    Read a CSV file as a generator over dictionaries.
119
    
120
    Parameters:
121
    - filename: Path to the CSV file
122
    - dialect: Dialect to use (auto-detected if None)
123
    - encoding: File encoding (auto-detected if None)
124
    - num_chars: Number of characters for detection (entire file if None)
125
    - verbose: Show detection progress
126
    
127
    Returns:
128
    Generator yielding rows as dictionaries
129
    
130
    Raises:
131
    NoDetectionResult: When dialect detection fails
132
    """
133
```
134

135
#### Usage Examples
136

137
```python
138
import clevercsv
139

140
# Read CSV as dictionaries
141
records = clevercsv.read_dicts('employees.csv')
142
for record in records:
143
    print(f"Name: {record['name']}, Age: {record['age']}")
144

145
# Stream dictionaries for large files
146
for record in clevercsv.stream_dicts('large_dataset.csv'):
147
    if int(record['score']) > 90:
148
        high_scorers.append(record)
149

150
# Access with error handling
151
try:
152
    records = clevercsv.read_dicts('messy_file.csv', verbose=True)
153
except clevercsv.NoDetectionResult:
154
    print("Could not detect CSV dialect")
155
```
156

157
### DataFrame Reading
158

159
Read CSV files directly into pandas DataFrames with automatic dialect detection. Combines CleverCSV's robust dialect detection with pandas' powerful data manipulation capabilities.
160

161
```python { .api }
162
def read_dataframe(
163
    filename: Union[str, PathLike],
164
    *args,
165
    num_chars: Optional[int] = None,
166
    **kwargs
167
) -> 'pandas.DataFrame':
168
    """
169
    Read a CSV file to a pandas DataFrame.
170
    
171
    Parameters:
172
    - filename: Path to the CSV file
173
    - *args: Additional arguments passed to pandas.read_csv
174
    - num_chars: Number of characters for dialect detection (entire file if None)
175
    - **kwargs: Additional keyword arguments passed to pandas.read_csv
176
    
177
    Returns:
178
    pandas DataFrame containing the CSV data
179
    
180
    Raises:
181
    NoDetectionResult: When dialect detection fails
182
    ValueError: If filename is not a regular file
183
    
184
    Notes:
185
    - Requires pandas to be installed
186
    - Detected dialect is automatically passed to pandas.read_csv
187
    - Encoding detection is performed if not specified in kwargs
188
    """
189
```
190

191
#### Usage Examples
192

193
```python
194
import clevercsv
195

196
# Read CSV into DataFrame
197
df = clevercsv.read_dataframe('sales_data.csv')
198
print(df.head())
199

200
# Pass additional pandas parameters
201
df = clevercsv.read_dataframe(
202
    'data.csv',
203
    parse_dates=['date_column'],
204
    index_col='id'
205
)
206

207
# Specify encoding and other options
208
df = clevercsv.read_dataframe(
209
    'data.csv',
210
    encoding='latin-1',
211
    na_values=['N/A', 'NULL'],
212
    dtype={'price': float}
213
)
214

215
# Use subset for detection on large files
216
df = clevercsv.read_dataframe('huge_file.csv', num_chars=10000)
217
```
218

219
## Performance Considerations
220

221
- **Detection Speed**: For large files, use `num_chars` parameter to limit detection to first N characters
222
- **Memory Usage**: Use streaming functions (`stream_table`, `stream_dicts`) for large files
223
- **Encoding Detection**: Install `cchardet` package for faster encoding detection
224
- **Dialect Caching**: Detected dialects can be saved and reused for similar files
225

226
## Error Handling
227

228
All reading functions may raise `NoDetectionResult` when dialect detection fails. This typically happens with:
229
- Files that are not valid CSV format
230
- Files with highly irregular structure
231
- Empty or very small files
232
- Binary files mistakenly treated as CSV
233

234
```python
235
import clevercsv
236

237
try:
238
    data = clevercsv.read_table('problematic.csv')
239
except clevercsv.NoDetectionResult:
240
    # Fallback to manual dialect specification
241
    dialect = clevercsv.SimpleDialect(',', '"', '')
242
    data = clevercsv.read_table('problematic.csv', dialect=dialect)
243
```

Version

Tile

Files

data-reading.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

data-reading.mddocs/