0
# Data Analysis
1
2
Tools for understanding dataset characteristics and feature distributions to inform model selection and feature engineering decisions.
3
4
## Capabilities
5
6
### Class Distribution Analysis
7
8
Analyzes class distributions in classification datasets to identify imbalances and understand target variable characteristics.
9
10
```python { .api }
11
class ClassHistogram:
12
def __init__(self, feature_names=None, **kwargs):
13
"""
14
Class distribution analyzer.
15
16
Parameters:
17
feature_names (list, optional): Names for features
18
**kwargs: Additional arguments
19
"""
20
21
def explain_data(self, X, y, name=None):
22
"""
23
Analyze class distributions in the dataset.
24
25
Parameters:
26
X (array-like): Feature data
27
y (array-like): Target labels
28
name (str, optional): Name for explanation
29
30
Returns:
31
Explanation object with class distribution analysis
32
"""
33
```
34
35
### Marginal Distribution Analysis
36
37
Analyzes marginal distributions of features to understand data characteristics and identify potential issues.
38
39
```python { .api }
40
class Marginal:
41
def __init__(self, feature_names=None, feature_types=None, **kwargs):
42
"""
43
Marginal distribution analyzer.
44
45
Parameters:
46
feature_names (list, optional): Names for features
47
feature_types (list, optional): Types for features
48
**kwargs: Additional arguments
49
"""
50
51
def explain_data(self, X, y=None, name=None):
52
"""
53
Analyze marginal feature distributions.
54
55
Parameters:
56
X (array-like): Feature data
57
y (array-like, optional): Target labels
58
name (str, optional): Name for explanation
59
60
Returns:
61
Explanation object with marginal distribution analysis
62
"""
63
```
64
65
## Usage Examples
66
67
```python
68
from interpret.data import ClassHistogram, Marginal
69
from interpret import show
70
from sklearn.datasets import load_wine
71
72
# Load dataset
73
data = load_wine()
74
X, y = data.data, data.target
75
76
# Analyze class distribution
77
class_hist = ClassHistogram(feature_names=data.feature_names)
78
class_exp = class_hist.explain_data(X, y, name="Class Distribution")
79
show(class_exp)
80
81
# Analyze feature distributions
82
marginal = Marginal(feature_names=data.feature_names)
83
marginal_exp = marginal.explain_data(X, y, name="Feature Distributions")
84
show(marginal_exp)
85
```