A pandas-based library to visualize and compare datasets.
npx @tessl/cli install tessl/pypi-sweetviz@2.3.00
# Sweetviz
1
2
A pandas-based library that generates beautiful, high-density visualizations for exploratory data analysis (EDA) with minimal code. Sweetviz specializes in target analysis, dataset comparison, and feature analysis, offering unified mixed-type associations that integrate numerical correlations, categorical associations, and categorical-numerical relationships seamlessly.
3
4
## Package Information
5
6
- **Package Name**: sweetviz
7
- **Language**: Python
8
- **Installation**: `pip install sweetviz`
9
10
## Core Imports
11
12
```python
13
import sweetviz as sv
14
```
15
16
## Basic Usage
17
18
```python
19
import sweetviz as sv
20
import pandas as pd
21
22
# Load your dataset
23
df = pd.read_csv('your_dataset.csv')
24
25
# Create a report analyzing the entire dataset
26
my_report = sv.analyze(df)
27
my_report.show_html() # Opens in browser
28
29
# Analyze with a target feature
30
my_report = sv.analyze(df, target_feat='target_column')
31
my_report.show_html()
32
33
# Compare two datasets (e.g., training vs test)
34
train_report = sv.compare([train_df, "Training"], [test_df, "Test"])
35
train_report.show_html()
36
37
# Compare subsets within the same dataset
38
my_report = sv.compare_intra(df, df["gender"] == "male", ["Male", "Female"])
39
my_report.show_html()
40
```
41
42
## Architecture
43
44
Sweetviz operates through a three-step process:
45
46
1. **Analysis Functions**: `analyze()`, `compare()`, or `compare_intra()` create `DataframeReport` objects
47
2. **Report Processing**: The library analyzes feature types, calculates statistics, and generates associations
48
3. **Output Generation**: Reports are rendered as self-contained HTML files or embedded in notebooks
49
50
Key components:
51
- **Analysis Engine**: Automatic type detection and statistical analysis
52
- **Association Matrix**: Unified correlation analysis across numerical, categorical, and mixed data types
53
- **Visualization Generator**: High-density charts and interactive HTML reports
54
- **Configuration System**: Customizable settings via INI files and FeatureConfig objects
55
56
## Capabilities
57
58
### Core Analysis Functions
59
60
Primary functions for creating exploratory data analysis reports. These functions analyze dataframes and return DataframeReport objects containing comprehensive statistics, visualizations, and association matrices.
61
62
```python { .api }
63
def analyze(source, target_feat=None, feat_cfg=None, pairwise_analysis='auto'): ...
64
def compare(source, compare, target_feat=None, feat_cfg=None, pairwise_analysis='auto'): ...
65
def compare_intra(source_df, condition_series, names, target_feat=None, feat_cfg=None, pairwise_analysis='auto'): ...
66
```
67
68
[Core Analysis](./core-analysis.md)
69
70
### Report Generation and Display
71
72
Methods for rendering and outputting analysis reports in various formats. DataframeReport objects provide multiple output options including HTML files, notebook embedding, and experiment tracking integration.
73
74
```python { .api }
75
class DataframeReport:
76
def show_html(filepath='SWEETVIZ_REPORT.html', open_browser=True, layout='widescreen', scale=None): ...
77
def show_notebook(w=None, h=None, scale=None, layout=None, filepath=None, file_layout=None, file_scale=None): ...
78
def log_comet(experiment): ...
79
```
80
81
[Report Display](./report-display.md)
82
83
### Feature Configuration
84
85
Configuration system for controlling feature type detection, analysis parameters, and report customization. Enables fine-tuned control over which features to analyze and how they should be interpreted.
86
87
```python { .api }
88
class FeatureConfig:
89
def __init__(skip=None, force_cat=None, force_text=None, force_num=None): ...
90
def get_predetermined_type(feature_name): ...
91
def get_all_mentioned_features(): ...
92
```
93
94
[Configuration](./configuration.md)
95
96
## Types
97
98
```python { .api }
99
from typing import Union, Tuple, List
100
import pandas as pd
101
from enum import Enum
102
103
# Core type aliases
104
DataFrameInput = Union[pd.DataFrame, Tuple[pd.DataFrame, str]]
105
106
class FeatureType(Enum):
107
TYPE_CAT = "CATEGORICAL"
108
TYPE_BOOL = "BOOL"
109
TYPE_NUM = "NUMERIC"
110
TYPE_TEXT = "TEXT"
111
TYPE_UNSUPPORTED = "UNSUPPORTED"
112
TYPE_ALL_NAN = "ALL_NAN"
113
TYPE_UNKNOWN = "UNKNOWN"
114
TYPE_SKIPPED = "SKIPPED"
115
def __str__(self): ...
116
117
class NumWithPercent:
118
def __init__(self, number, total_for_percentage): ...
119
def __int__(self): ...
120
def __float__(self): ...
121
def __repr__(self): ...
122
```