Tessl Tile for pypi/datacompy@0.18.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

column-utilities.md distributed-comparison.md index.md multi-backend-comparison.md pandas-comparison.md reporting.md

reporting.mddocs/

0
# Reporting and Output
1

2
Template-based reporting system with customizable HTML and text output, providing detailed comparison statistics, mismatch samples, and publication-ready reports.
3

4
## Capabilities
5

6
### Template Rendering System
7

8
Jinja2-based template system for generating customizable comparison reports with flexible formatting options.
9

10
```python { .api }
11
def render(template_name: str, **context: Any) -> str:
12
    """
13
    Render Jinja2 template with provided context.
14
    
15
    Parameters:
16
    - template_name: Name of template file to render
17
    - **context: Template variables as keyword arguments
18
    
19
    Returns:
20
    Rendered template as string
21
    """
22
```
23

24
### HTML Report Generation
25

26
Generate and save HTML reports with interactive features and professional formatting.
27

28
```python { .api }
29
def save_html_report(report: str, html_file: str | Path) -> None:
30
    """
31
    Save comparison report as HTML file.
32
    
33
    Parameters:
34
    - report: Report content as string
35
    - html_file: Path where HTML file should be saved
36
    """
37
```
38

39
### DataFrame String Conversion
40

41
Convert DataFrames to formatted string representations for display and logging purposes.
42

43
```python { .api }
44
def df_to_str(df: Any, sample_count: int | None, on_index: bool) -> str:
45
    """
46
    Convert DataFrame to formatted string representation.
47
    
48
    Parameters:
49
    - df: DataFrame to convert (any supported backend)
50
    - sample_count: Number of rows to include (None for all)
51
    - on_index: Whether to include index in output
52
    
53
    Returns:
54
    Formatted string representation of DataFrame
55
    """
56
```
57

58
### Utility Functions
59

60
Helper functions for report generation and data formatting.
61

62
```python { .api }
63
def temp_column_name(*dataframes) -> str:
64
    """
65
    Generate unique temporary column name that doesn't conflict with existing columns.
66
    
67
    Parameters:
68
    - *dataframes: Variable number of DataFrames to check for column conflicts
69
    
70
    Returns:
71
    Unique temporary column name as string
72
    """
73
```
74

75
## Template System
76

77
### Default Template Variables
78

79
The default report template (`report_template.j2`) supports the following variables:
80

81
```python { .api }
82
# Template context variables
83
df1_name: str           # Name of first DataFrame
84
df2_name: str           # Name of second DataFrame
85
df1_shape: tuple        # Shape of first DataFrame (rows, columns)
86
df2_shape: tuple        # Shape of second DataFrame (rows, columns)
87
column_summary: dict    # Summary of column differences
88
row_summary: dict       # Summary of row differences
89
column_comparison: list # Detailed column-by-column statistics
90
mismatch_stats: dict    # Statistics about mismatched values
91
df1_unique_rows: Any    # Rows unique to first DataFrame
92
df2_unique_rows: Any    # Rows unique to second DataFrame
93
column_count: int       # Number of columns to include in detailed output
94
```
95

96
### Custom Templates
97

98
Create custom templates for specialized reporting needs:
99

100
```python
101
# Use custom template
102
custom_report = comparison.report(
103
    template_path='/path/to/custom/templates',
104
    sample_count=20
105
)
106

107
# Available in custom templates
108
template_vars = {
109
    'comparison_summary': '...',
110
    'detailed_stats': [...],
111
    'sample_mismatches': {...},
112
    'metadata': {...}
113
}
114
```
115

116
## Usage Examples
117

118
### Basic Report Generation
119

120
```python
121
import pandas as pd
122
import datacompy
123

124
# Create test DataFrames
125
df1 = pd.DataFrame({
126
    'id': [1, 2, 3, 4],
127
    'value': [10.0, 20.0, 30.0, 40.0],
128
    'status': ['active', 'active', 'inactive', 'active']
129
})
130

131
df2 = pd.DataFrame({
132
    'id': [1, 2, 3, 5],
133
    'value': [10.1, 20.0, 30.0, 50.0],
134
    'status': ['active', 'active', 'inactive', 'pending']
135
})
136

137
# Create comparison
138
compare = datacompy.Compare(df1, df2, join_columns=['id'])
139

140
# Generate basic text report
141
text_report = compare.report()
142
print(text_report)
143

144
# Generate HTML report
145
html_report = compare.report(html_file='comparison_report.html')
146
print("HTML report saved to comparison_report.html")
147
```
148

149
### Customized Report Parameters
150

151
```python
152
# Detailed report with more samples and columns
153
detailed_report = compare.report(
154
    sample_count=25,      # Show 25 sample mismatches
155
    column_count=20       # Include up to 20 columns in stats
156
)
157

158
# Minimal report
159
minimal_report = compare.report(
160
    sample_count=5,       # Show only 5 sample mismatches
161
    column_count=5        # Include only 5 columns in stats
162
)
163
```
164

165
### Custom HTML Styling
166

167
```python
168
# Generate report with additional context
169
custom_context = {
170
    'title': 'Quarterly Data Comparison',
171
    'analyst': 'Data Team',
172
    'date': '2024-01-15'
173
}
174

175
# Create custom template that includes these variables
176
custom_report = compare.report(
177
    html_file='quarterly_report.html',
178
    template_path='/path/to/custom/templates'
179
)
180
```
181

182
### DataFrame Display Utilities
183

184
```python
185
import datacompy
186

187
# Convert DataFrame to string for logging
188
df_string = datacompy.df_to_str(
189
    df1, 
190
    sample_count=10,    # Show first 10 rows
191
    on_index=True       # Include index
192
)
193
print("DataFrame preview:")
194
print(df_string)
195

196
# Generate temporary column name
197
temp_col = datacompy.temp_column_name(df1, df2)
198
print(f"Safe temporary column name: {temp_col}")
199
```
200

201
### Programmatic Report Processing
202

203
```python
204
# Generate report and extract specific information
205
report = compare.report()
206

207
# Parse report sections (example)
208
lines = report.split('\n')
209
summary_line = [line for line in lines if 'DataFrames match' in line][0]
210
print(f"Match status: {summary_line}")
211

212
# Access structured comparison data
213
print(f"Unique rows in df1: {len(compare.df1_unq_rows)}")
214
print(f"Unique rows in df2: {len(compare.df2_unq_rows)}")
215
print(f"Column statistics: {compare.column_stats}")
216
```
217

218
### Batch Report Generation
219

220
```python
221
import os
222
from datetime import datetime
223

224
# Generate multiple reports with timestamps
225
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
226

227
# Text report
228
text_file = f"comparison_report_{timestamp}.txt"
229
with open(text_file, 'w') as f:
230
    f.write(compare.report())
231

232
# HTML report
233
html_file = f"comparison_report_{timestamp}.html"
234
compare.report(html_file=html_file)
235

236
# Summary report for dashboard
237
summary = {
238
    'timestamp': timestamp,
239
    'matches': compare.matches(),
240
    'total_rows_df1': len(compare.df1),
241
    'total_rows_df2': len(compare.df2),
242
    'unique_rows_df1': len(compare.df1_unq_rows),
243
    'unique_rows_df2': len(compare.df2_unq_rows),
244
    'shared_columns': len(compare.intersect_columns()),
245
    'unique_columns_df1': len(compare.df1_unq_columns()),
246
    'unique_columns_df2': len(compare.df2_unq_columns())
247
}
248

249
import json
250
with open(f"comparison_summary_{timestamp}.json", 'w') as f:
251
    json.dump(summary, f, indent=2)
252
```
253

254
### Template Customization
255

256
```python
257
# Create custom template directory structure
258
# /custom_templates/
259
#   └── custom_report.j2
260

261
custom_template_content = """
262
<!DOCTYPE html>
263
<html>
264
<head>
265
    <title>{{ title | default('DataComPy Comparison Report') }}</title>
266
    <style>
267
        .summary { background-color: #f0f0f0; padding: 10px; }
268
        .mismatch { background-color: #ffe6e6; }
269
        .match { background-color: #e6ffe6; }
270
    </style>
271
</head>
272
<body>
273
    <h1>Comparison: {{ df1_name }} vs {{ df2_name }}</h1>
274
    
275
    <div class="summary">
276
        <h2>Summary</h2>
277
        <p>{{ df1_name }}: {{ df1_shape[0] }} rows, {{ df1_shape[1] }} columns</p>
278
        <p>{{ df2_name }}: {{ df2_shape[0] }} rows, {{ df2_shape[1] }} columns</p>
279
    </div>
280
    
281
    <!-- Custom sections here -->
282
    
283
</body>
284
</html>
285
"""
286

287
# Save custom template
288
os.makedirs('/custom_templates', exist_ok=True)
289
with open('/custom_templates/custom_report.j2', 'w') as f:
290
    f.write(custom_template_content)
291

292
# Use custom template
293
custom_report = compare.report(
294
    html_file='custom_comparison.html',
295
    template_path='/custom_templates'
296
)
297
```
298

299
### Integration with Jupyter Notebooks
300

301
```python
302
from IPython.display import HTML, display
303
import datacompy
304

305
# Generate comparison
306
compare = datacompy.Compare(df1, df2, join_columns=['id'])
307

308
# Display HTML report inline in Jupyter
309
html_report = compare.report()
310
display(HTML(html_report))
311

312
# Or save and display file
313
compare.report(html_file='notebook_report.html')
314
display(HTML(filename='notebook_report.html'))
315
```
316

317
## Report Output Format
318

319
The default report includes the following sections:
320

321
1. **Executive Summary**: High-level match status and key statistics
322
2. **DataFrame Overview**: Shape and basic information about each DataFrame
323
3. **Column Analysis**: Detailed breakdown of column differences and matches
324
4. **Row Analysis**: Information about unique and shared rows
325
5. **Mismatch Details**: Sample of mismatched values with statistics
326
6. **Statistical Summary**: Numerical summary of differences
327

328
Each section can be customized through template modification or by using different template files for specific reporting needs.

Version

Tile

Files

reporting.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

reporting.mddocs/