0
# Configuration
1
2
Configuration system for controlling feature type detection, analysis parameters, and report customization. Enables fine-tuned control over which features to analyze and how they should be interpreted.
3
4
## Capabilities
5
6
### Feature Configuration
7
8
Controls how individual features are processed during analysis. Allows overriding automatic type detection and excluding features from analysis.
9
10
```python { .api }
11
class FeatureConfig:
12
def __init__(self,
13
skip: Union[str, List[str], Tuple[str]] = None,
14
force_cat: Union[str, List[str], Tuple[str]] = None,
15
force_text: Union[str, List[str], Tuple[str]] = None,
16
force_num: Union[str, List[str], Tuple[str]] = None):
17
"""
18
Configure feature processing behavior.
19
20
Parameters:
21
- skip: Features to exclude from analysis
22
- force_cat: Features to treat as categorical
23
- force_text: Features to treat as text
24
- force_num: Features to treat as numerical
25
26
All parameters accept single strings, lists, or tuples of feature names.
27
"""
28
29
def get_predetermined_type(self, feature_name: str) -> FeatureType:
30
"""
31
Get the predetermined type for a feature.
32
33
Parameters:
34
- feature_name: Name of the feature
35
36
Returns:
37
FeatureType enum value indicating predetermined type
38
"""
39
40
def get_all_mentioned_features(self) -> List[str]:
41
"""
42
Get list of all features mentioned in configuration.
43
44
Returns:
45
List of all feature names in any configuration category
46
"""
47
```
48
49
#### Usage Examples
50
51
```python
52
import sweetviz as sv
53
54
# Skip specific features
55
config = sv.FeatureConfig(skip=['id', 'timestamp'])
56
report = sv.analyze(df, feat_cfg=config)
57
58
# Force feature types
59
config = sv.FeatureConfig(
60
skip='user_id',
61
force_cat=['status', 'category'],
62
force_num=['year', 'rating'],
63
force_text='description'
64
)
65
report = sv.analyze(df, feat_cfg=config)
66
67
# Multiple ways to specify features
68
config = sv.FeatureConfig(
69
skip=['id', 'created_at'], # List
70
force_cat=('status', 'type'), # Tuple
71
force_num='rating' # Single string
72
)
73
74
# Check configuration
75
config = sv.FeatureConfig(skip=['id'], force_cat=['status'])
76
feature_type = config.get_predetermined_type('status') # Returns FeatureType.TYPE_CAT
77
all_features = config.get_all_mentioned_features() # Returns ['id', 'status']
78
```
79
80
### Global Configuration
81
82
System-wide settings controlled through INI configuration files. Allows customizing default behavior, appearance, and performance parameters.
83
84
```python { .api }
85
import configparser
86
87
config_parser: configparser.ConfigParser
88
```
89
90
#### Usage Examples
91
92
```python
93
import sweetviz as sv
94
95
# Load custom configuration
96
sv.config_parser.read("my_config.ini")
97
98
# Must be called before creating reports
99
report = sv.analyze(df)
100
```
101
102
#### Configuration File Structure
103
104
Create custom INI files to override defaults:
105
106
```ini
107
[General]
108
default_verbosity = progress_only
109
use_cjk_font = 1
110
111
[Output_Defaults]
112
html_layout = vertical
113
html_scale = 0.9
114
notebook_layout = widescreen
115
notebook_scale = 0.8
116
notebook_width = 100%
117
notebook_height = 700
118
119
[Layout]
120
show_logo = 0
121
122
[comet_ml_defaults]
123
html_layout = vertical
124
html_scale = 0.85
125
```
126
127
## Feature Type Control
128
129
### Automatic Type Detection
130
131
Sweetviz automatically detects feature types:
132
133
- **Numerical**: Integer and float columns
134
- **Categorical**: String columns and low-cardinality numerics
135
- **Boolean**: Binary columns (True/False, 1/0, Yes/No)
136
- **Text**: High-cardinality string columns
137
138
### Type Override Examples
139
140
```python
141
# Common override scenarios
142
143
# Treat year as categorical instead of numerical
144
config = sv.FeatureConfig(force_cat=['year'])
145
146
# Treat encoded categories as numerical
147
config = sv.FeatureConfig(force_num=['category_encoded'])
148
149
# Treat long strings as text features
150
config = sv.FeatureConfig(force_text=['comments', 'description'])
151
152
# Skip features that shouldn't be analyzed
153
config = sv.FeatureConfig(skip=['id', 'uuid', 'internal_code'])
154
155
# Combined configuration
156
config = sv.FeatureConfig(
157
skip=['id', 'created_at', 'updated_at'],
158
force_cat=['zip_code', 'product_code'],
159
force_num=['rating_1_to_5'],
160
force_text=['user_comments']
161
)
162
```
163
164
## Configuration Categories
165
166
### General Settings
167
168
```ini
169
[General]
170
# Verbosity levels: full, progress_only, off, default
171
default_verbosity = progress_only
172
173
# Enable CJK (Chinese/Japanese/Korean) font support
174
use_cjk_font = 1
175
```
176
177
### Output Defaults
178
179
```ini
180
[Output_Defaults]
181
# HTML report defaults
182
html_layout = widescreen # widescreen or vertical
183
html_scale = 1.0
184
185
# Notebook display defaults
186
notebook_layout = vertical
187
notebook_scale = 0.9
188
notebook_width = 100% # Use %% for literal %
189
notebook_height = 700
190
```
191
192
### Layout Customization
193
194
```ini
195
[Layout]
196
# Remove Sweetviz logo
197
show_logo = 0
198
199
# Custom styling options (advanced)
200
# See sweetviz_defaults.ini for full options
201
```
202
203
### Comet.ml Integration
204
205
```ini
206
[comet_ml_defaults]
207
# Defaults for Comet.ml logging
208
html_layout = vertical
209
html_scale = 0.85
210
```
211
212
## Special Handling
213
214
### Index Column Renaming
215
216
Features named "index" are automatically renamed to "df_index" to avoid conflicts:
217
218
```python
219
# If DataFrame has column named 'index'
220
df = pd.DataFrame({'index': [1,2,3], 'value': [10,20,30]})
221
222
# Sweetviz automatically renames to 'df_index'
223
config = sv.FeatureConfig(skip=['df_index']) # Use 'df_index', not 'index'
224
report = sv.analyze(df, feat_cfg=config)
225
```
226
227
### Target Feature Constraints
228
229
```python
230
# Target features must be boolean or numerical
231
config = sv.FeatureConfig(force_num=['encoded_target'])
232
report = sv.analyze(df, target_feat='encoded_target', feat_cfg=config)
233
234
# This will raise ValueError - categorical targets not supported
235
try:
236
report = sv.analyze(df, target_feat='category_column')
237
except ValueError as e:
238
print("Use force_num to convert categorical to numerical if appropriate")
239
```
240
241
## Performance Configuration
242
243
### Pairwise Analysis Threshold
244
245
Control when correlation analysis prompts for confirmation:
246
247
```python
248
# Large datasets - control pairwise analysis
249
report = sv.analyze(large_df, pairwise_analysis='off') # Skip correlations
250
report = sv.analyze(large_df, pairwise_analysis='on') # Force correlations
251
report = sv.analyze(large_df, pairwise_analysis='auto') # Auto-decide (default)
252
```
253
254
### Memory Optimization
255
256
```python
257
# For large datasets, skip expensive computations
258
config = sv.FeatureConfig(skip=list_of_high_cardinality_features)
259
report = sv.analyze(df,
260
feat_cfg=config,
261
pairwise_analysis='off')
262
263
# Use smaller scale for large reports
264
report.show_html(scale=0.7)
265
```
266
267
## Error Handling
268
269
```python
270
# Handle configuration errors
271
try:
272
config = sv.FeatureConfig(skip=['nonexistent_column'])
273
report = sv.analyze(df, feat_cfg=config)
274
except Exception as e:
275
print(f"Configuration warning: {e}")
276
277
# Handle INI file errors
278
try:
279
sv.config_parser.read("nonexistent.ini")
280
except FileNotFoundError:
281
print("Configuration file not found, using defaults")
282
283
# Validate feature names exist
284
available_features = set(df.columns)
285
skip_features = ['id', 'timestamp']
286
valid_skip = [f for f in skip_features if f in available_features]
287
config = sv.FeatureConfig(skip=valid_skip)
288
```
289
290
## Configuration Best Practices
291
292
### Common Patterns
293
294
```python
295
# Standard data science workflow
296
config = sv.FeatureConfig(
297
skip=['id', 'uuid', 'created_at', 'updated_at'], # Skip metadata
298
force_cat=['zip_code', 'product_id'], # IDs as categories
299
force_num=['rating', 'score'], # Ordinal as numeric
300
force_text=['comments', 'description'] # Long text fields
301
)
302
303
# Time series data
304
config = sv.FeatureConfig(
305
skip=['timestamp', 'date'], # Skip time columns
306
force_cat=['day_of_week'], # Cyclical as categorical
307
force_num=['month', 'quarter'] # Temporal as numeric
308
)
309
310
# Survey data
311
config = sv.FeatureConfig(
312
force_cat=['satisfaction_level', 'education'], # Ordinal categories
313
force_num=['age_group', 'income_bracket'], # Ranked as numeric
314
force_text=['feedback_text'] # Open responses
315
)
316
```
317
318
### Configuration Management
319
320
```python
321
# Save configuration for reuse
322
def create_standard_config():
323
return sv.FeatureConfig(
324
skip=['id', 'timestamp'],
325
force_cat=['category', 'status'],
326
force_num=['rating']
327
)
328
329
# Use across multiple analyses
330
config = create_standard_config()
331
train_report = sv.analyze(train_df, feat_cfg=config)
332
test_report = sv.analyze(test_df, feat_cfg=config)
333
```