0
# Data Reading
1
2
High-level wrapper functions that automatically detect CSV dialects and encodings, providing the most convenient way to read CSV files without manual configuration. These functions handle the complexity of dialect detection and provide clean interfaces for common CSV reading tasks.
3
4
## Capabilities
5
6
### Table Reading
7
8
Read CSV files as lists of lists (rows), with automatic dialect detection and optional streaming for large files.
9
10
```python { .api }
11
def read_table(
12
filename: Union[str, PathLike],
13
dialect: Optional[Union[str, SimpleDialect, csv.Dialect]] = None,
14
encoding: Optional[str] = None,
15
num_chars: Optional[int] = None,
16
verbose: bool = False
17
) -> List[List[str]]:
18
"""
19
Read a CSV file as a table (list of lists).
20
21
Parameters:
22
- filename: Path to the CSV file
23
- dialect: Dialect to use (auto-detected if None)
24
- encoding: File encoding (auto-detected if None)
25
- num_chars: Number of characters for detection (entire file if None)
26
- verbose: Show detection progress
27
28
Returns:
29
List of rows, each row is a list of string fields
30
31
Raises:
32
NoDetectionResult: When dialect detection fails
33
"""
34
35
def stream_table(
36
filename: Union[str, PathLike],
37
dialect: Optional[Union[str, SimpleDialect, csv.Dialect]] = None,
38
encoding: Optional[str] = None,
39
num_chars: Optional[int] = None,
40
verbose: bool = False
41
) -> Iterator[List[str]]:
42
"""
43
Read a CSV file as a generator over rows.
44
45
Parameters:
46
- filename: Path to the CSV file
47
- dialect: Dialect to use (auto-detected if None)
48
- encoding: File encoding (auto-detected if None)
49
- num_chars: Number of characters for detection (entire file if None)
50
- verbose: Show detection progress
51
52
Returns:
53
Generator yielding rows as lists of strings
54
55
Raises:
56
NoDetectionResult: When dialect detection fails
57
"""
58
```
59
60
#### Usage Examples
61
62
```python
63
import clevercsv
64
65
# Read entire CSV file
66
rows = clevercsv.read_table('data.csv')
67
print(f"Read {len(rows)} rows")
68
69
# Read with specific encoding
70
rows = clevercsv.read_table('data.csv', encoding='utf-8')
71
72
# Stream large files to avoid memory issues
73
for row in clevercsv.stream_table('large_file.csv'):
74
process_row(row)
75
76
# Use specific dialect
77
dialect = clevercsv.SimpleDialect(',', '"', '')
78
rows = clevercsv.read_table('data.csv', dialect=dialect)
79
```
80
81
### Dictionary Reading
82
83
Read CSV files as lists of dictionaries, treating the first row as column headers. Ideal for structured data where you need named access to fields.
84
85
```python { .api }
86
def read_dicts(
87
filename: Union[str, PathLike],
88
dialect: Optional[Union[str, SimpleDialect, csv.Dialect]] = None,
89
encoding: Optional[str] = None,
90
num_chars: Optional[int] = None,
91
verbose: bool = False
92
) -> List[Dict[str, str]]:
93
"""
94
Read a CSV file as a list of dictionaries.
95
96
Parameters:
97
- filename: Path to the CSV file
98
- dialect: Dialect to use (auto-detected if None)
99
- encoding: File encoding (auto-detected if None)
100
- num_chars: Number of characters for detection (entire file if None)
101
- verbose: Show detection progress
102
103
Returns:
104
List of dictionaries where keys are column names from first row
105
106
Raises:
107
NoDetectionResult: When dialect detection fails
108
"""
109
110
def stream_dicts(
111
filename: Union[str, PathLike],
112
dialect: Optional[Union[str, SimpleDialect, csv.Dialect]] = None,
113
encoding: Optional[str] = None,
114
num_chars: Optional[int] = None,
115
verbose: bool = False
116
) -> Iterator[Dict[str, str]]:
117
"""
118
Read a CSV file as a generator over dictionaries.
119
120
Parameters:
121
- filename: Path to the CSV file
122
- dialect: Dialect to use (auto-detected if None)
123
- encoding: File encoding (auto-detected if None)
124
- num_chars: Number of characters for detection (entire file if None)
125
- verbose: Show detection progress
126
127
Returns:
128
Generator yielding rows as dictionaries
129
130
Raises:
131
NoDetectionResult: When dialect detection fails
132
"""
133
```
134
135
#### Usage Examples
136
137
```python
138
import clevercsv
139
140
# Read CSV as dictionaries
141
records = clevercsv.read_dicts('employees.csv')
142
for record in records:
143
print(f"Name: {record['name']}, Age: {record['age']}")
144
145
# Stream dictionaries for large files
146
for record in clevercsv.stream_dicts('large_dataset.csv'):
147
if int(record['score']) > 90:
148
high_scorers.append(record)
149
150
# Access with error handling
151
try:
152
records = clevercsv.read_dicts('messy_file.csv', verbose=True)
153
except clevercsv.NoDetectionResult:
154
print("Could not detect CSV dialect")
155
```
156
157
### DataFrame Reading
158
159
Read CSV files directly into pandas DataFrames with automatic dialect detection. Combines CleverCSV's robust dialect detection with pandas' powerful data manipulation capabilities.
160
161
```python { .api }
162
def read_dataframe(
163
filename: Union[str, PathLike],
164
*args,
165
num_chars: Optional[int] = None,
166
**kwargs
167
) -> 'pandas.DataFrame':
168
"""
169
Read a CSV file to a pandas DataFrame.
170
171
Parameters:
172
- filename: Path to the CSV file
173
- *args: Additional arguments passed to pandas.read_csv
174
- num_chars: Number of characters for dialect detection (entire file if None)
175
- **kwargs: Additional keyword arguments passed to pandas.read_csv
176
177
Returns:
178
pandas DataFrame containing the CSV data
179
180
Raises:
181
NoDetectionResult: When dialect detection fails
182
ValueError: If filename is not a regular file
183
184
Notes:
185
- Requires pandas to be installed
186
- Detected dialect is automatically passed to pandas.read_csv
187
- Encoding detection is performed if not specified in kwargs
188
"""
189
```
190
191
#### Usage Examples
192
193
```python
194
import clevercsv
195
196
# Read CSV into DataFrame
197
df = clevercsv.read_dataframe('sales_data.csv')
198
print(df.head())
199
200
# Pass additional pandas parameters
201
df = clevercsv.read_dataframe(
202
'data.csv',
203
parse_dates=['date_column'],
204
index_col='id'
205
)
206
207
# Specify encoding and other options
208
df = clevercsv.read_dataframe(
209
'data.csv',
210
encoding='latin-1',
211
na_values=['N/A', 'NULL'],
212
dtype={'price': float}
213
)
214
215
# Use subset for detection on large files
216
df = clevercsv.read_dataframe('huge_file.csv', num_chars=10000)
217
```
218
219
## Performance Considerations
220
221
- **Detection Speed**: For large files, use `num_chars` parameter to limit detection to first N characters
222
- **Memory Usage**: Use streaming functions (`stream_table`, `stream_dicts`) for large files
223
- **Encoding Detection**: Install `cchardet` package for faster encoding detection
224
- **Dialect Caching**: Detected dialects can be saved and reused for similar files
225
226
## Error Handling
227
228
All reading functions may raise `NoDetectionResult` when dialect detection fails. This typically happens with:
229
- Files that are not valid CSV format
230
- Files with highly irregular structure
231
- Empty or very small files
232
- Binary files mistakenly treated as CSV
233
234
```python
235
import clevercsv
236
237
try:
238
data = clevercsv.read_table('problematic.csv')
239
except clevercsv.NoDetectionResult:
240
# Fallback to manual dialect specification
241
dialect = clevercsv.SimpleDialect(',', '"', '')
242
data = clevercsv.read_table('problematic.csv', dialect=dialect)
243
```