Tessl Tile for pypi/sweetviz@2.3.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

configuration.md core-analysis.md index.md report-display.md

core-analysis.mddocs/

0
# Core Analysis Functions
1

2
Primary functions for creating exploratory data analysis reports. These functions analyze pandas DataFrames and return DataframeReport objects containing comprehensive statistics, visualizations, and association matrices.
3

4
## Capabilities
5

6
### Single DataFrame Analysis
7

8
Analyzes a single DataFrame, generating comprehensive statistics, visualizations, and feature relationships. Optionally focuses analysis around a target feature to highlight correlations and associations.
9

10
```python { .api }
11
def analyze(source: Union[pd.DataFrame, Tuple[pd.DataFrame, str]], 
12
           target_feat: str = None,
13
           feat_cfg: FeatureConfig = None, 
14
           pairwise_analysis: str = 'auto') -> DataframeReport:
15
    """
16
    Analyze a single DataFrame and generate a report.
17
    
18
    Parameters:
19
    - source: DataFrame to analyze, or tuple of [DataFrame, "Display Name"]
20
    - target_feat: Name of target feature for focused analysis (boolean/numerical only)
21
    - feat_cfg: FeatureConfig object for controlling feature processing
22
    - pairwise_analysis: Controls correlation analysis ('auto', 'on', 'off')
23
    
24
    Returns:
25
    DataframeReport object containing analysis results
26
    """
27
```
28

29
#### Usage Examples
30

31
```python
32
import sweetviz as sv
33
import pandas as pd
34

35
# Basic analysis
36
df = pd.read_csv('data.csv')
37
report = sv.analyze(df)
38

39
# Analysis with named dataset
40
report = sv.analyze([df, "My Dataset"])
41

42
# Target-focused analysis
43
report = sv.analyze(df, target_feat='outcome')
44

45
# With feature configuration
46
config = sv.FeatureConfig(skip=['id'], force_cat=['category'])
47
report = sv.analyze(df, target_feat='price', feat_cfg=config)
48

49
# Control pairwise analysis for large datasets
50
report = sv.analyze(df, pairwise_analysis='off')  # Skip correlation matrix
51
```
52

53
### Dataset Comparison
54

55
Compares two datasets side-by-side, highlighting differences in distributions, statistics, and feature relationships. Ideal for comparing training/test splits or different data versions.
56

57
```python { .api }
58
def compare(source: Union[pd.DataFrame, Tuple[pd.DataFrame, str]], 
59
           compare: Union[pd.DataFrame, Tuple[pd.DataFrame, str]],
60
           target_feat: str = None,
61
           feat_cfg: FeatureConfig = None, 
62
           pairwise_analysis: str = 'auto') -> DataframeReport:
63
    """
64
    Compare two DataFrames and generate a comparison report.
65
    
66
    Parameters:
67
    - source: Primary DataFrame or [DataFrame, "Display Name"]
68
    - compare: Comparison DataFrame or [DataFrame, "Display Name"]
69
    - target_feat: Name of target feature for focused analysis (boolean/numerical only)
70
    - feat_cfg: FeatureConfig object for controlling feature processing
71
    - pairwise_analysis: Controls correlation analysis ('auto', 'on', 'off')
72
    
73
    Returns:
74
    DataframeReport object containing comparison results
75
    """
76
```
77

78
#### Usage Examples
79

80
```python
81
# Compare training and test sets
82
train_df = pd.read_csv('train.csv')
83
test_df = pd.read_csv('test.csv')
84

85
report = sv.compare([train_df, "Training"], [test_df, "Test"])
86

87
# Compare with target analysis
88
report = sv.compare([train_df, "Training"], [test_df, "Test"], target_feat='label')
89

90
# Compare datasets with different names
91
old_data = pd.read_csv('old.csv')
92
new_data = pd.read_csv('new.csv')
93
report = sv.compare([old_data, "Previous Version"], [new_data, "Current Version"])
94
```
95

96
### Intra-Dataset Comparison
97

98
Compares subsets within the same DataFrame based on a boolean condition. Useful for analyzing differences between groups (e.g., male vs female, treatment vs control).
99

100
```python { .api }
101
def compare_intra(source_df: pd.DataFrame,
102
                 condition_series: pd.Series,
103
                 names: Tuple[str, str],
104
                 target_feat: str = None,
105
                 feat_cfg: FeatureConfig = None,
106
                 pairwise_analysis: str = 'auto') -> DataframeReport:
107
    """
108
    Compare subsets within the same DataFrame based on a boolean condition.
109
    
110
    Parameters:
111
    - source_df: DataFrame to analyze
112
    - condition_series: Boolean Series for splitting data (same length as source_df)
113
    - names: Tuple of names for (True subset, False subset)
114
    - target_feat: Name of target feature for focused analysis (boolean/numerical only)
115
    - feat_cfg: FeatureConfig object for controlling feature processing
116
    - pairwise_analysis: Controls correlation analysis ('auto', 'on', 'off')
117
    
118
    Returns:
119
    DataframeReport object containing intra-dataset comparison
120
    
121
    Raises:
122
    ValueError: If condition_series length doesn't match source_df or isn't boolean type
123
    ValueError: If either subset is empty after splitting
124
    """
125
```
126

127
#### Usage Examples
128

129
```python
130
# Compare male vs female
131
df = pd.read_csv('data.csv')
132
report = sv.compare_intra(df, df["gender"] == "male", ["Male", "Female"])
133

134
# Compare with target feature
135
report = sv.compare_intra(df, 
136
                         df["age"] > 30, 
137
                         ["Over 30", "30 and Under"],
138
                         target_feat="income")
139

140
# Compare treatment groups
141
report = sv.compare_intra(df, 
142
                         df["treatment"] == "A", 
143
                         ["Treatment A", "Treatment B"],
144
                         target_feat="outcome")
145

146
# Complex boolean conditions
147
high_income = (df["income"] > df["income"].median())
148
report = sv.compare_intra(df, high_income, ["High Income", "Low Income"])
149
```
150

151
## Parameter Details
152

153
### target_feat Parameter
154

155
- **Supported Types**: Only boolean and numerical features can be targets
156
- **Effect**: Highlights correlations and associations with the target feature
157
- **Categorical Targets**: Not supported - use FeatureConfig to force numerical if needed
158

159
### pairwise_analysis Parameter
160

161
- **'auto'** (default): Automatically decides based on dataset size (uses association_auto_threshold)
162
- **'on'**: Forces pairwise analysis regardless of dataset size
163
- **'off'**: Skips pairwise correlation/association analysis
164
- **Performance**: Correlation analysis is O(n²) in number of features
165

166
### feat_cfg Parameter
167

168
See [Configuration](./configuration.md) for detailed FeatureConfig usage.
169

170
## Error Handling
171

172
All analysis functions may raise:
173

174
- **ValueError**: Invalid parameters, unsupported target types, empty datasets
175
- **TypeError**: Invalid data types for parameters
176
- **KeyError**: Target feature not found in DataFrame
177
- **MemoryError**: Dataset too large for available memory
178

179
Common errors and solutions:
180

181
```python
182
# Handle missing target feature
183
try:
184
    report = sv.analyze(df, target_feat='nonexistent')
185
except KeyError:
186
    print("Target feature not found in DataFrame")
187

188
# Handle categorical target
189
try:
190
    report = sv.analyze(df, target_feat='category')
191
except ValueError as e:
192
    if "CATEGORICAL" in str(e):
193
        # Force to numerical if appropriate
194
        config = sv.FeatureConfig(force_num=['category'])
195
        report = sv.analyze(df, target_feat='category', feat_cfg=config)
196
```

Version

Tile

Files

core-analysis.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

core-analysis.mddocs/