or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

configuration.mdcore-analysis.mdindex.mdreport-display.md

core-analysis.mddocs/

0

# Core Analysis Functions

1

2

Primary functions for creating exploratory data analysis reports. These functions analyze pandas DataFrames and return DataframeReport objects containing comprehensive statistics, visualizations, and association matrices.

3

4

## Capabilities

5

6

### Single DataFrame Analysis

7

8

Analyzes a single DataFrame, generating comprehensive statistics, visualizations, and feature relationships. Optionally focuses analysis around a target feature to highlight correlations and associations.

9

10

```python { .api }

11

def analyze(source: Union[pd.DataFrame, Tuple[pd.DataFrame, str]],

12

target_feat: str = None,

13

feat_cfg: FeatureConfig = None,

14

pairwise_analysis: str = 'auto') -> DataframeReport:

15

"""

16

Analyze a single DataFrame and generate a report.

17

18

Parameters:

19

- source: DataFrame to analyze, or tuple of [DataFrame, "Display Name"]

20

- target_feat: Name of target feature for focused analysis (boolean/numerical only)

21

- feat_cfg: FeatureConfig object for controlling feature processing

22

- pairwise_analysis: Controls correlation analysis ('auto', 'on', 'off')

23

24

Returns:

25

DataframeReport object containing analysis results

26

"""

27

```

28

29

#### Usage Examples

30

31

```python

32

import sweetviz as sv

33

import pandas as pd

34

35

# Basic analysis

36

df = pd.read_csv('data.csv')

37

report = sv.analyze(df)

38

39

# Analysis with named dataset

40

report = sv.analyze([df, "My Dataset"])

41

42

# Target-focused analysis

43

report = sv.analyze(df, target_feat='outcome')

44

45

# With feature configuration

46

config = sv.FeatureConfig(skip=['id'], force_cat=['category'])

47

report = sv.analyze(df, target_feat='price', feat_cfg=config)

48

49

# Control pairwise analysis for large datasets

50

report = sv.analyze(df, pairwise_analysis='off') # Skip correlation matrix

51

```

52

53

### Dataset Comparison

54

55

Compares two datasets side-by-side, highlighting differences in distributions, statistics, and feature relationships. Ideal for comparing training/test splits or different data versions.

56

57

```python { .api }

58

def compare(source: Union[pd.DataFrame, Tuple[pd.DataFrame, str]],

59

compare: Union[pd.DataFrame, Tuple[pd.DataFrame, str]],

60

target_feat: str = None,

61

feat_cfg: FeatureConfig = None,

62

pairwise_analysis: str = 'auto') -> DataframeReport:

63

"""

64

Compare two DataFrames and generate a comparison report.

65

66

Parameters:

67

- source: Primary DataFrame or [DataFrame, "Display Name"]

68

- compare: Comparison DataFrame or [DataFrame, "Display Name"]

69

- target_feat: Name of target feature for focused analysis (boolean/numerical only)

70

- feat_cfg: FeatureConfig object for controlling feature processing

71

- pairwise_analysis: Controls correlation analysis ('auto', 'on', 'off')

72

73

Returns:

74

DataframeReport object containing comparison results

75

"""

76

```

77

78

#### Usage Examples

79

80

```python

81

# Compare training and test sets

82

train_df = pd.read_csv('train.csv')

83

test_df = pd.read_csv('test.csv')

84

85

report = sv.compare([train_df, "Training"], [test_df, "Test"])

86

87

# Compare with target analysis

88

report = sv.compare([train_df, "Training"], [test_df, "Test"], target_feat='label')

89

90

# Compare datasets with different names

91

old_data = pd.read_csv('old.csv')

92

new_data = pd.read_csv('new.csv')

93

report = sv.compare([old_data, "Previous Version"], [new_data, "Current Version"])

94

```

95

96

### Intra-Dataset Comparison

97

98

Compares subsets within the same DataFrame based on a boolean condition. Useful for analyzing differences between groups (e.g., male vs female, treatment vs control).

99

100

```python { .api }

101

def compare_intra(source_df: pd.DataFrame,

102

condition_series: pd.Series,

103

names: Tuple[str, str],

104

target_feat: str = None,

105

feat_cfg: FeatureConfig = None,

106

pairwise_analysis: str = 'auto') -> DataframeReport:

107

"""

108

Compare subsets within the same DataFrame based on a boolean condition.

109

110

Parameters:

111

- source_df: DataFrame to analyze

112

- condition_series: Boolean Series for splitting data (same length as source_df)

113

- names: Tuple of names for (True subset, False subset)

114

- target_feat: Name of target feature for focused analysis (boolean/numerical only)

115

- feat_cfg: FeatureConfig object for controlling feature processing

116

- pairwise_analysis: Controls correlation analysis ('auto', 'on', 'off')

117

118

Returns:

119

DataframeReport object containing intra-dataset comparison

120

121

Raises:

122

ValueError: If condition_series length doesn't match source_df or isn't boolean type

123

ValueError: If either subset is empty after splitting

124

"""

125

```

126

127

#### Usage Examples

128

129

```python

130

# Compare male vs female

131

df = pd.read_csv('data.csv')

132

report = sv.compare_intra(df, df["gender"] == "male", ["Male", "Female"])

133

134

# Compare with target feature

135

report = sv.compare_intra(df,

136

df["age"] > 30,

137

["Over 30", "30 and Under"],

138

target_feat="income")

139

140

# Compare treatment groups

141

report = sv.compare_intra(df,

142

df["treatment"] == "A",

143

["Treatment A", "Treatment B"],

144

target_feat="outcome")

145

146

# Complex boolean conditions

147

high_income = (df["income"] > df["income"].median())

148

report = sv.compare_intra(df, high_income, ["High Income", "Low Income"])

149

```

150

151

## Parameter Details

152

153

### target_feat Parameter

154

155

- **Supported Types**: Only boolean and numerical features can be targets

156

- **Effect**: Highlights correlations and associations with the target feature

157

- **Categorical Targets**: Not supported - use FeatureConfig to force numerical if needed

158

159

### pairwise_analysis Parameter

160

161

- **'auto'** (default): Automatically decides based on dataset size (uses association_auto_threshold)

162

- **'on'**: Forces pairwise analysis regardless of dataset size

163

- **'off'**: Skips pairwise correlation/association analysis

164

- **Performance**: Correlation analysis is O(n²) in number of features

165

166

### feat_cfg Parameter

167

168

See [Configuration](./configuration.md) for detailed FeatureConfig usage.

169

170

## Error Handling

171

172

All analysis functions may raise:

173

174

- **ValueError**: Invalid parameters, unsupported target types, empty datasets

175

- **TypeError**: Invalid data types for parameters

176

- **KeyError**: Target feature not found in DataFrame

177

- **MemoryError**: Dataset too large for available memory

178

179

Common errors and solutions:

180

181

```python

182

# Handle missing target feature

183

try:

184

report = sv.analyze(df, target_feat='nonexistent')

185

except KeyError:

186

print("Target feature not found in DataFrame")

187

188

# Handle categorical target

189

try:

190

report = sv.analyze(df, target_feat='category')

191

except ValueError as e:

192

if "CATEGORICAL" in str(e):

193

# Force to numerical if appropriate

194

config = sv.FeatureConfig(force_num=['category'])

195

report = sv.analyze(df, target_feat='category', feat_cfg=config)

196

```