0
# Reporting and Output
1
2
Template-based reporting system with customizable HTML and text output, providing detailed comparison statistics, mismatch samples, and publication-ready reports.
3
4
## Capabilities
5
6
### Template Rendering System
7
8
Jinja2-based template system for generating customizable comparison reports with flexible formatting options.
9
10
```python { .api }
11
def render(template_name: str, **context: Any) -> str:
12
"""
13
Render Jinja2 template with provided context.
14
15
Parameters:
16
- template_name: Name of template file to render
17
- **context: Template variables as keyword arguments
18
19
Returns:
20
Rendered template as string
21
"""
22
```
23
24
### HTML Report Generation
25
26
Generate and save HTML reports with interactive features and professional formatting.
27
28
```python { .api }
29
def save_html_report(report: str, html_file: str | Path) -> None:
30
"""
31
Save comparison report as HTML file.
32
33
Parameters:
34
- report: Report content as string
35
- html_file: Path where HTML file should be saved
36
"""
37
```
38
39
### DataFrame String Conversion
40
41
Convert DataFrames to formatted string representations for display and logging purposes.
42
43
```python { .api }
44
def df_to_str(df: Any, sample_count: int | None, on_index: bool) -> str:
45
"""
46
Convert DataFrame to formatted string representation.
47
48
Parameters:
49
- df: DataFrame to convert (any supported backend)
50
- sample_count: Number of rows to include (None for all)
51
- on_index: Whether to include index in output
52
53
Returns:
54
Formatted string representation of DataFrame
55
"""
56
```
57
58
### Utility Functions
59
60
Helper functions for report generation and data formatting.
61
62
```python { .api }
63
def temp_column_name(*dataframes) -> str:
64
"""
65
Generate unique temporary column name that doesn't conflict with existing columns.
66
67
Parameters:
68
- *dataframes: Variable number of DataFrames to check for column conflicts
69
70
Returns:
71
Unique temporary column name as string
72
"""
73
```
74
75
## Template System
76
77
### Default Template Variables
78
79
The default report template (`report_template.j2`) supports the following variables:
80
81
```python { .api }
82
# Template context variables
83
df1_name: str # Name of first DataFrame
84
df2_name: str # Name of second DataFrame
85
df1_shape: tuple # Shape of first DataFrame (rows, columns)
86
df2_shape: tuple # Shape of second DataFrame (rows, columns)
87
column_summary: dict # Summary of column differences
88
row_summary: dict # Summary of row differences
89
column_comparison: list # Detailed column-by-column statistics
90
mismatch_stats: dict # Statistics about mismatched values
91
df1_unique_rows: Any # Rows unique to first DataFrame
92
df2_unique_rows: Any # Rows unique to second DataFrame
93
column_count: int # Number of columns to include in detailed output
94
```
95
96
### Custom Templates
97
98
Create custom templates for specialized reporting needs:
99
100
```python
101
# Use custom template
102
custom_report = comparison.report(
103
template_path='/path/to/custom/templates',
104
sample_count=20
105
)
106
107
# Available in custom templates
108
template_vars = {
109
'comparison_summary': '...',
110
'detailed_stats': [...],
111
'sample_mismatches': {...},
112
'metadata': {...}
113
}
114
```
115
116
## Usage Examples
117
118
### Basic Report Generation
119
120
```python
121
import pandas as pd
122
import datacompy
123
124
# Create test DataFrames
125
df1 = pd.DataFrame({
126
'id': [1, 2, 3, 4],
127
'value': [10.0, 20.0, 30.0, 40.0],
128
'status': ['active', 'active', 'inactive', 'active']
129
})
130
131
df2 = pd.DataFrame({
132
'id': [1, 2, 3, 5],
133
'value': [10.1, 20.0, 30.0, 50.0],
134
'status': ['active', 'active', 'inactive', 'pending']
135
})
136
137
# Create comparison
138
compare = datacompy.Compare(df1, df2, join_columns=['id'])
139
140
# Generate basic text report
141
text_report = compare.report()
142
print(text_report)
143
144
# Generate HTML report
145
html_report = compare.report(html_file='comparison_report.html')
146
print("HTML report saved to comparison_report.html")
147
```
148
149
### Customized Report Parameters
150
151
```python
152
# Detailed report with more samples and columns
153
detailed_report = compare.report(
154
sample_count=25, # Show 25 sample mismatches
155
column_count=20 # Include up to 20 columns in stats
156
)
157
158
# Minimal report
159
minimal_report = compare.report(
160
sample_count=5, # Show only 5 sample mismatches
161
column_count=5 # Include only 5 columns in stats
162
)
163
```
164
165
### Custom HTML Styling
166
167
```python
168
# Generate report with additional context
169
custom_context = {
170
'title': 'Quarterly Data Comparison',
171
'analyst': 'Data Team',
172
'date': '2024-01-15'
173
}
174
175
# Create custom template that includes these variables
176
custom_report = compare.report(
177
html_file='quarterly_report.html',
178
template_path='/path/to/custom/templates'
179
)
180
```
181
182
### DataFrame Display Utilities
183
184
```python
185
import datacompy
186
187
# Convert DataFrame to string for logging
188
df_string = datacompy.df_to_str(
189
df1,
190
sample_count=10, # Show first 10 rows
191
on_index=True # Include index
192
)
193
print("DataFrame preview:")
194
print(df_string)
195
196
# Generate temporary column name
197
temp_col = datacompy.temp_column_name(df1, df2)
198
print(f"Safe temporary column name: {temp_col}")
199
```
200
201
### Programmatic Report Processing
202
203
```python
204
# Generate report and extract specific information
205
report = compare.report()
206
207
# Parse report sections (example)
208
lines = report.split('\n')
209
summary_line = [line for line in lines if 'DataFrames match' in line][0]
210
print(f"Match status: {summary_line}")
211
212
# Access structured comparison data
213
print(f"Unique rows in df1: {len(compare.df1_unq_rows)}")
214
print(f"Unique rows in df2: {len(compare.df2_unq_rows)}")
215
print(f"Column statistics: {compare.column_stats}")
216
```
217
218
### Batch Report Generation
219
220
```python
221
import os
222
from datetime import datetime
223
224
# Generate multiple reports with timestamps
225
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
226
227
# Text report
228
text_file = f"comparison_report_{timestamp}.txt"
229
with open(text_file, 'w') as f:
230
f.write(compare.report())
231
232
# HTML report
233
html_file = f"comparison_report_{timestamp}.html"
234
compare.report(html_file=html_file)
235
236
# Summary report for dashboard
237
summary = {
238
'timestamp': timestamp,
239
'matches': compare.matches(),
240
'total_rows_df1': len(compare.df1),
241
'total_rows_df2': len(compare.df2),
242
'unique_rows_df1': len(compare.df1_unq_rows),
243
'unique_rows_df2': len(compare.df2_unq_rows),
244
'shared_columns': len(compare.intersect_columns()),
245
'unique_columns_df1': len(compare.df1_unq_columns()),
246
'unique_columns_df2': len(compare.df2_unq_columns())
247
}
248
249
import json
250
with open(f"comparison_summary_{timestamp}.json", 'w') as f:
251
json.dump(summary, f, indent=2)
252
```
253
254
### Template Customization
255
256
```python
257
# Create custom template directory structure
258
# /custom_templates/
259
# └── custom_report.j2
260
261
custom_template_content = """
262
<!DOCTYPE html>
263
<html>
264
<head>
265
<title>{{ title | default('DataComPy Comparison Report') }}</title>
266
<style>
267
.summary { background-color: #f0f0f0; padding: 10px; }
268
.mismatch { background-color: #ffe6e6; }
269
.match { background-color: #e6ffe6; }
270
</style>
271
</head>
272
<body>
273
<h1>Comparison: {{ df1_name }} vs {{ df2_name }}</h1>
274
275
<div class="summary">
276
<h2>Summary</h2>
277
<p>{{ df1_name }}: {{ df1_shape[0] }} rows, {{ df1_shape[1] }} columns</p>
278
<p>{{ df2_name }}: {{ df2_shape[0] }} rows, {{ df2_shape[1] }} columns</p>
279
</div>
280
281
<!-- Custom sections here -->
282
283
</body>
284
</html>
285
"""
286
287
# Save custom template
288
os.makedirs('/custom_templates', exist_ok=True)
289
with open('/custom_templates/custom_report.j2', 'w') as f:
290
f.write(custom_template_content)
291
292
# Use custom template
293
custom_report = compare.report(
294
html_file='custom_comparison.html',
295
template_path='/custom_templates'
296
)
297
```
298
299
### Integration with Jupyter Notebooks
300
301
```python
302
from IPython.display import HTML, display
303
import datacompy
304
305
# Generate comparison
306
compare = datacompy.Compare(df1, df2, join_columns=['id'])
307
308
# Display HTML report inline in Jupyter
309
html_report = compare.report()
310
display(HTML(html_report))
311
312
# Or save and display file
313
compare.report(html_file='notebook_report.html')
314
display(HTML(filename='notebook_report.html'))
315
```
316
317
## Report Output Format
318
319
The default report includes the following sections:
320
321
1. **Executive Summary**: High-level match status and key statistics
322
2. **DataFrame Overview**: Shape and basic information about each DataFrame
323
3. **Column Analysis**: Detailed breakdown of column differences and matches
324
4. **Row Analysis**: Information about unique and shared rows
325
5. **Mismatch Details**: Sample of mismatched values with statistics
326
6. **Statistical Summary**: Numerical summary of differences
327
328
Each section can be customized through template modification or by using different template files for specific reporting needs.