or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-reading-writing.mddata-reading.mddata-writing.mddialect-detection.mddialects-configuration.mddictionary-operations.mdindex.md

data-reading.mddocs/

0

# Data Reading

1

2

High-level wrapper functions that automatically detect CSV dialects and encodings, providing the most convenient way to read CSV files without manual configuration. These functions handle the complexity of dialect detection and provide clean interfaces for common CSV reading tasks.

3

4

## Capabilities

5

6

### Table Reading

7

8

Read CSV files as lists of lists (rows), with automatic dialect detection and optional streaming for large files.

9

10

```python { .api }

11

def read_table(

12

filename: Union[str, PathLike],

13

dialect: Optional[Union[str, SimpleDialect, csv.Dialect]] = None,

14

encoding: Optional[str] = None,

15

num_chars: Optional[int] = None,

16

verbose: bool = False

17

) -> List[List[str]]:

18

"""

19

Read a CSV file as a table (list of lists).

20

21

Parameters:

22

- filename: Path to the CSV file

23

- dialect: Dialect to use (auto-detected if None)

24

- encoding: File encoding (auto-detected if None)

25

- num_chars: Number of characters for detection (entire file if None)

26

- verbose: Show detection progress

27

28

Returns:

29

List of rows, each row is a list of string fields

30

31

Raises:

32

NoDetectionResult: When dialect detection fails

33

"""

34

35

def stream_table(

36

filename: Union[str, PathLike],

37

dialect: Optional[Union[str, SimpleDialect, csv.Dialect]] = None,

38

encoding: Optional[str] = None,

39

num_chars: Optional[int] = None,

40

verbose: bool = False

41

) -> Iterator[List[str]]:

42

"""

43

Read a CSV file as a generator over rows.

44

45

Parameters:

46

- filename: Path to the CSV file

47

- dialect: Dialect to use (auto-detected if None)

48

- encoding: File encoding (auto-detected if None)

49

- num_chars: Number of characters for detection (entire file if None)

50

- verbose: Show detection progress

51

52

Returns:

53

Generator yielding rows as lists of strings

54

55

Raises:

56

NoDetectionResult: When dialect detection fails

57

"""

58

```

59

60

#### Usage Examples

61

62

```python

63

import clevercsv

64

65

# Read entire CSV file

66

rows = clevercsv.read_table('data.csv')

67

print(f"Read {len(rows)} rows")

68

69

# Read with specific encoding

70

rows = clevercsv.read_table('data.csv', encoding='utf-8')

71

72

# Stream large files to avoid memory issues

73

for row in clevercsv.stream_table('large_file.csv'):

74

process_row(row)

75

76

# Use specific dialect

77

dialect = clevercsv.SimpleDialect(',', '"', '')

78

rows = clevercsv.read_table('data.csv', dialect=dialect)

79

```

80

81

### Dictionary Reading

82

83

Read CSV files as lists of dictionaries, treating the first row as column headers. Ideal for structured data where you need named access to fields.

84

85

```python { .api }

86

def read_dicts(

87

filename: Union[str, PathLike],

88

dialect: Optional[Union[str, SimpleDialect, csv.Dialect]] = None,

89

encoding: Optional[str] = None,

90

num_chars: Optional[int] = None,

91

verbose: bool = False

92

) -> List[Dict[str, str]]:

93

"""

94

Read a CSV file as a list of dictionaries.

95

96

Parameters:

97

- filename: Path to the CSV file

98

- dialect: Dialect to use (auto-detected if None)

99

- encoding: File encoding (auto-detected if None)

100

- num_chars: Number of characters for detection (entire file if None)

101

- verbose: Show detection progress

102

103

Returns:

104

List of dictionaries where keys are column names from first row

105

106

Raises:

107

NoDetectionResult: When dialect detection fails

108

"""

109

110

def stream_dicts(

111

filename: Union[str, PathLike],

112

dialect: Optional[Union[str, SimpleDialect, csv.Dialect]] = None,

113

encoding: Optional[str] = None,

114

num_chars: Optional[int] = None,

115

verbose: bool = False

116

) -> Iterator[Dict[str, str]]:

117

"""

118

Read a CSV file as a generator over dictionaries.

119

120

Parameters:

121

- filename: Path to the CSV file

122

- dialect: Dialect to use (auto-detected if None)

123

- encoding: File encoding (auto-detected if None)

124

- num_chars: Number of characters for detection (entire file if None)

125

- verbose: Show detection progress

126

127

Returns:

128

Generator yielding rows as dictionaries

129

130

Raises:

131

NoDetectionResult: When dialect detection fails

132

"""

133

```

134

135

#### Usage Examples

136

137

```python

138

import clevercsv

139

140

# Read CSV as dictionaries

141

records = clevercsv.read_dicts('employees.csv')

142

for record in records:

143

print(f"Name: {record['name']}, Age: {record['age']}")

144

145

# Stream dictionaries for large files

146

for record in clevercsv.stream_dicts('large_dataset.csv'):

147

if int(record['score']) > 90:

148

high_scorers.append(record)

149

150

# Access with error handling

151

try:

152

records = clevercsv.read_dicts('messy_file.csv', verbose=True)

153

except clevercsv.NoDetectionResult:

154

print("Could not detect CSV dialect")

155

```

156

157

### DataFrame Reading

158

159

Read CSV files directly into pandas DataFrames with automatic dialect detection. Combines CleverCSV's robust dialect detection with pandas' powerful data manipulation capabilities.

160

161

```python { .api }

162

def read_dataframe(

163

filename: Union[str, PathLike],

164

*args,

165

num_chars: Optional[int] = None,

166

**kwargs

167

) -> 'pandas.DataFrame':

168

"""

169

Read a CSV file to a pandas DataFrame.

170

171

Parameters:

172

- filename: Path to the CSV file

173

- *args: Additional arguments passed to pandas.read_csv

174

- num_chars: Number of characters for dialect detection (entire file if None)

175

- **kwargs: Additional keyword arguments passed to pandas.read_csv

176

177

Returns:

178

pandas DataFrame containing the CSV data

179

180

Raises:

181

NoDetectionResult: When dialect detection fails

182

ValueError: If filename is not a regular file

183

184

Notes:

185

- Requires pandas to be installed

186

- Detected dialect is automatically passed to pandas.read_csv

187

- Encoding detection is performed if not specified in kwargs

188

"""

189

```

190

191

#### Usage Examples

192

193

```python

194

import clevercsv

195

196

# Read CSV into DataFrame

197

df = clevercsv.read_dataframe('sales_data.csv')

198

print(df.head())

199

200

# Pass additional pandas parameters

201

df = clevercsv.read_dataframe(

202

'data.csv',

203

parse_dates=['date_column'],

204

index_col='id'

205

)

206

207

# Specify encoding and other options

208

df = clevercsv.read_dataframe(

209

'data.csv',

210

encoding='latin-1',

211

na_values=['N/A', 'NULL'],

212

dtype={'price': float}

213

)

214

215

# Use subset for detection on large files

216

df = clevercsv.read_dataframe('huge_file.csv', num_chars=10000)

217

```

218

219

## Performance Considerations

220

221

- **Detection Speed**: For large files, use `num_chars` parameter to limit detection to first N characters

222

- **Memory Usage**: Use streaming functions (`stream_table`, `stream_dicts`) for large files

223

- **Encoding Detection**: Install `cchardet` package for faster encoding detection

224

- **Dialect Caching**: Detected dialects can be saved and reused for similar files

225

226

## Error Handling

227

228

All reading functions may raise `NoDetectionResult` when dialect detection fails. This typically happens with:

229

- Files that are not valid CSV format

230

- Files with highly irregular structure

231

- Empty or very small files

232

- Binary files mistakenly treated as CSV

233

234

```python

235

import clevercsv

236

237

try:

238

data = clevercsv.read_table('problematic.csv')

239

except clevercsv.NoDetectionResult:

240

# Fallback to manual dialect specification

241

dialect = clevercsv.SimpleDialect(',', '"', '')

242

data = clevercsv.read_table('problematic.csv', dialect=dialect)

243

```