or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

column-utilities.mddistributed-comparison.mdindex.mdmulti-backend-comparison.mdpandas-comparison.mdreporting.md

reporting.mddocs/

0

# Reporting and Output

1

2

Template-based reporting system with customizable HTML and text output, providing detailed comparison statistics, mismatch samples, and publication-ready reports.

3

4

## Capabilities

5

6

### Template Rendering System

7

8

Jinja2-based template system for generating customizable comparison reports with flexible formatting options.

9

10

```python { .api }

11

def render(template_name: str, **context: Any) -> str:

12

"""

13

Render Jinja2 template with provided context.

14

15

Parameters:

16

- template_name: Name of template file to render

17

- **context: Template variables as keyword arguments

18

19

Returns:

20

Rendered template as string

21

"""

22

```

23

24

### HTML Report Generation

25

26

Generate and save HTML reports with interactive features and professional formatting.

27

28

```python { .api }

29

def save_html_report(report: str, html_file: str | Path) -> None:

30

"""

31

Save comparison report as HTML file.

32

33

Parameters:

34

- report: Report content as string

35

- html_file: Path where HTML file should be saved

36

"""

37

```

38

39

### DataFrame String Conversion

40

41

Convert DataFrames to formatted string representations for display and logging purposes.

42

43

```python { .api }

44

def df_to_str(df: Any, sample_count: int | None, on_index: bool) -> str:

45

"""

46

Convert DataFrame to formatted string representation.

47

48

Parameters:

49

- df: DataFrame to convert (any supported backend)

50

- sample_count: Number of rows to include (None for all)

51

- on_index: Whether to include index in output

52

53

Returns:

54

Formatted string representation of DataFrame

55

"""

56

```

57

58

### Utility Functions

59

60

Helper functions for report generation and data formatting.

61

62

```python { .api }

63

def temp_column_name(*dataframes) -> str:

64

"""

65

Generate unique temporary column name that doesn't conflict with existing columns.

66

67

Parameters:

68

- *dataframes: Variable number of DataFrames to check for column conflicts

69

70

Returns:

71

Unique temporary column name as string

72

"""

73

```

74

75

## Template System

76

77

### Default Template Variables

78

79

The default report template (`report_template.j2`) supports the following variables:

80

81

```python { .api }

82

# Template context variables

83

df1_name: str # Name of first DataFrame

84

df2_name: str # Name of second DataFrame

85

df1_shape: tuple # Shape of first DataFrame (rows, columns)

86

df2_shape: tuple # Shape of second DataFrame (rows, columns)

87

column_summary: dict # Summary of column differences

88

row_summary: dict # Summary of row differences

89

column_comparison: list # Detailed column-by-column statistics

90

mismatch_stats: dict # Statistics about mismatched values

91

df1_unique_rows: Any # Rows unique to first DataFrame

92

df2_unique_rows: Any # Rows unique to second DataFrame

93

column_count: int # Number of columns to include in detailed output

94

```

95

96

### Custom Templates

97

98

Create custom templates for specialized reporting needs:

99

100

```python

101

# Use custom template

102

custom_report = comparison.report(

103

template_path='/path/to/custom/templates',

104

sample_count=20

105

)

106

107

# Available in custom templates

108

template_vars = {

109

'comparison_summary': '...',

110

'detailed_stats': [...],

111

'sample_mismatches': {...},

112

'metadata': {...}

113

}

114

```

115

116

## Usage Examples

117

118

### Basic Report Generation

119

120

```python

121

import pandas as pd

122

import datacompy

123

124

# Create test DataFrames

125

df1 = pd.DataFrame({

126

'id': [1, 2, 3, 4],

127

'value': [10.0, 20.0, 30.0, 40.0],

128

'status': ['active', 'active', 'inactive', 'active']

129

})

130

131

df2 = pd.DataFrame({

132

'id': [1, 2, 3, 5],

133

'value': [10.1, 20.0, 30.0, 50.0],

134

'status': ['active', 'active', 'inactive', 'pending']

135

})

136

137

# Create comparison

138

compare = datacompy.Compare(df1, df2, join_columns=['id'])

139

140

# Generate basic text report

141

text_report = compare.report()

142

print(text_report)

143

144

# Generate HTML report

145

html_report = compare.report(html_file='comparison_report.html')

146

print("HTML report saved to comparison_report.html")

147

```

148

149

### Customized Report Parameters

150

151

```python

152

# Detailed report with more samples and columns

153

detailed_report = compare.report(

154

sample_count=25, # Show 25 sample mismatches

155

column_count=20 # Include up to 20 columns in stats

156

)

157

158

# Minimal report

159

minimal_report = compare.report(

160

sample_count=5, # Show only 5 sample mismatches

161

column_count=5 # Include only 5 columns in stats

162

)

163

```

164

165

### Custom HTML Styling

166

167

```python

168

# Generate report with additional context

169

custom_context = {

170

'title': 'Quarterly Data Comparison',

171

'analyst': 'Data Team',

172

'date': '2024-01-15'

173

}

174

175

# Create custom template that includes these variables

176

custom_report = compare.report(

177

html_file='quarterly_report.html',

178

template_path='/path/to/custom/templates'

179

)

180

```

181

182

### DataFrame Display Utilities

183

184

```python

185

import datacompy

186

187

# Convert DataFrame to string for logging

188

df_string = datacompy.df_to_str(

189

df1,

190

sample_count=10, # Show first 10 rows

191

on_index=True # Include index

192

)

193

print("DataFrame preview:")

194

print(df_string)

195

196

# Generate temporary column name

197

temp_col = datacompy.temp_column_name(df1, df2)

198

print(f"Safe temporary column name: {temp_col}")

199

```

200

201

### Programmatic Report Processing

202

203

```python

204

# Generate report and extract specific information

205

report = compare.report()

206

207

# Parse report sections (example)

208

lines = report.split('\n')

209

summary_line = [line for line in lines if 'DataFrames match' in line][0]

210

print(f"Match status: {summary_line}")

211

212

# Access structured comparison data

213

print(f"Unique rows in df1: {len(compare.df1_unq_rows)}")

214

print(f"Unique rows in df2: {len(compare.df2_unq_rows)}")

215

print(f"Column statistics: {compare.column_stats}")

216

```

217

218

### Batch Report Generation

219

220

```python

221

import os

222

from datetime import datetime

223

224

# Generate multiple reports with timestamps

225

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

226

227

# Text report

228

text_file = f"comparison_report_{timestamp}.txt"

229

with open(text_file, 'w') as f:

230

f.write(compare.report())

231

232

# HTML report

233

html_file = f"comparison_report_{timestamp}.html"

234

compare.report(html_file=html_file)

235

236

# Summary report for dashboard

237

summary = {

238

'timestamp': timestamp,

239

'matches': compare.matches(),

240

'total_rows_df1': len(compare.df1),

241

'total_rows_df2': len(compare.df2),

242

'unique_rows_df1': len(compare.df1_unq_rows),

243

'unique_rows_df2': len(compare.df2_unq_rows),

244

'shared_columns': len(compare.intersect_columns()),

245

'unique_columns_df1': len(compare.df1_unq_columns()),

246

'unique_columns_df2': len(compare.df2_unq_columns())

247

}

248

249

import json

250

with open(f"comparison_summary_{timestamp}.json", 'w') as f:

251

json.dump(summary, f, indent=2)

252

```

253

254

### Template Customization

255

256

```python

257

# Create custom template directory structure

258

# /custom_templates/

259

# └── custom_report.j2

260

261

custom_template_content = """

262

<!DOCTYPE html>

263

<html>

264

<head>

265

<title>{{ title | default('DataComPy Comparison Report') }}</title>

266

<style>

267

.summary { background-color: #f0f0f0; padding: 10px; }

268

.mismatch { background-color: #ffe6e6; }

269

.match { background-color: #e6ffe6; }

270

</style>

271

</head>

272

<body>

273

<h1>Comparison: {{ df1_name }} vs {{ df2_name }}</h1>

274

275

<div class="summary">

276

<h2>Summary</h2>

277

<p>{{ df1_name }}: {{ df1_shape[0] }} rows, {{ df1_shape[1] }} columns</p>

278

<p>{{ df2_name }}: {{ df2_shape[0] }} rows, {{ df2_shape[1] }} columns</p>

279

</div>

280

281

<!-- Custom sections here -->

282

283

</body>

284

</html>

285

"""

286

287

# Save custom template

288

os.makedirs('/custom_templates', exist_ok=True)

289

with open('/custom_templates/custom_report.j2', 'w') as f:

290

f.write(custom_template_content)

291

292

# Use custom template

293

custom_report = compare.report(

294

html_file='custom_comparison.html',

295

template_path='/custom_templates'

296

)

297

```

298

299

### Integration with Jupyter Notebooks

300

301

```python

302

from IPython.display import HTML, display

303

import datacompy

304

305

# Generate comparison

306

compare = datacompy.Compare(df1, df2, join_columns=['id'])

307

308

# Display HTML report inline in Jupyter

309

html_report = compare.report()

310

display(HTML(html_report))

311

312

# Or save and display file

313

compare.report(html_file='notebook_report.html')

314

display(HTML(filename='notebook_report.html'))

315

```

316

317

## Report Output Format

318

319

The default report includes the following sections:

320

321

1. **Executive Summary**: High-level match status and key statistics

322

2. **DataFrame Overview**: Shape and basic information about each DataFrame

323

3. **Column Analysis**: Detailed breakdown of column differences and matches

324

4. **Row Analysis**: Information about unique and shared rows

325

5. **Mismatch Details**: Sample of mismatched values with statistics

326

6. **Statistical Summary**: Numerical summary of differences

327

328

Each section can be customized through template modification or by using different template files for specific reporting needs.