0
# SHAP
1
2
SHAP (SHapley Additive exPlanations) is a comprehensive machine learning explainability library that provides a game-theoretic approach to explain the output of any machine learning model. The library connects optimal credit allocation with local explanations using classic Shapley values from game theory, offering a unified framework that encompasses multiple explanation methods including LIME, DeepLIFT, and others.
3
4
## Package Information
5
6
- **Package Name**: shap
7
- **Package Type**: pypi
8
- **Language**: Python
9
- **Installation**: `pip install shap`
10
- **Version**: 0.48.0
11
12
## Core Imports
13
14
```python
15
import shap
16
```
17
18
Common imports for specific functionality:
19
20
```python
21
# Core explanation classes
22
from shap import Explanation, Cohorts
23
24
# Explainers
25
from shap import TreeExplainer, KernelExplainer, DeepExplainer
26
from shap import LinearExplainer, GradientExplainer
27
from shap.explainers import other # Alternative explainers (LIME, MAPLE, etc.)
28
29
# Plotting functions
30
import shap.plots as shap_plots
31
# or individual imports
32
from shap import force_plot, waterfall_plot, summary_plot
33
34
# Datasets and utilities
35
from shap import datasets, utils
36
```
37
38
## Basic Usage
39
40
```python
41
import shap
42
import pandas as pd
43
from sklearn.ensemble import RandomForestClassifier
44
45
# Load a dataset
46
X, y = shap.datasets.adult()
47
48
# Train a model
49
model = RandomForestClassifier(n_estimators=100, random_state=42)
50
model.fit(X, y)
51
52
# Create explainer and compute SHAP values
53
explainer = shap.TreeExplainer(model)
54
shap_values = explainer(X)
55
56
# Visualize explanations
57
shap.plots.waterfall(shap_values[0]) # Single prediction
58
shap.plots.beeswarm(shap_values) # All predictions
59
shap.plots.bar(shap_values) # Feature importance
60
```
61
62
## Architecture
63
64
SHAP provides a unified explainability framework built around several key components:
65
66
- **Explainers**: Algorithm-specific explanation methods optimized for different model types (trees, neural networks, linear models, etc.)
67
- **Explanation Objects**: Rich containers for SHAP values with metadata, supporting aggregation and analysis operations
68
- **Maskers**: Data masking strategies for different input types (tabular, text, images) that handle feature dependencies
69
- **Visualization**: Comprehensive plotting functions for various explanation visualization needs
70
- **Utilities**: Helper functions for sampling, clustering, and data manipulation
71
72
This design enables high-performance explanations across diverse model architectures while maintaining mathematical guarantees and providing intuitive visualizations for understanding model behavior.
73
74
## Capabilities
75
76
### Model Explainers
77
78
High-performance explanation algorithms optimized for specific model types, providing exact or approximate SHAP values with mathematical guarantees for local accuracy and consistency.
79
80
```python { .api }
81
class TreeExplainer:
82
def __init__(self, model, data=None, model_output="raw", feature_perturbation="auto", feature_names=None): ...
83
def __call__(self, X, y=None, interactions=False, check_additivity=True) -> Explanation: ...
84
85
class KernelExplainer:
86
def __init__(self, model, data, feature_names=None, link="identity"): ...
87
def __call__(self, X, l1_reg="num_features(10)", silent=False) -> Explanation: ...
88
89
class DeepExplainer:
90
def __init__(self, model, data, session=None, learning_phase_flags=None): ...
91
def __call__(self, X) -> Explanation: ...
92
```
93
94
[Model Explainers](./explainers.md)
95
96
### Visualization and Plotting
97
98
Comprehensive visualization functions for understanding and communicating model explanations, including interactive plots, summary visualizations, and detailed analysis charts.
99
100
```python { .api }
101
def waterfall(shap_values, max_display=10, show=True): ...
102
def beeswarm(shap_values, max_display=10, order=Explanation.abs.mean(0), show=True): ...
103
def bar(shap_values, max_display=10, order=Explanation.abs, show=True): ...
104
def force(base_value, shap_values=None, features=None, matplotlib=False, show=True): ...
105
def heatmap(shap_values, instance_order=Explanation.hclust(), max_display=10, show=True): ...
106
```
107
108
[Visualization and Plotting](./visualization.md)
109
110
### Data Utilities and Helpers
111
112
Built-in datasets, masking strategies, utility functions, and helper classes for data preprocessing, sampling, and analysis workflows.
113
114
```python { .api }
115
# Datasets
116
def adult(display=False, n_points=None) -> tuple[pd.DataFrame, np.ndarray]: ...
117
def california(n_points=None) -> tuple[pd.DataFrame, np.ndarray]: ...
118
def imagenet50(resolution=224, n_points=None) -> tuple[np.ndarray, np.ndarray]: ...
119
120
# Maskers
121
class Independent:
122
def __init__(self, data, max_samples=100): ...
123
124
class Text:
125
def __init__(self, tokenizer=None, mask_token=None, output_type="string"): ...
126
127
# Utilities
128
def sample(X, nsamples=100, random_state=0): ...
129
def approximate_interactions(index, shap_values, X, feature_names=None) -> np.ndarray: ...
130
```
131
132
[Data Utilities and Helpers](./utilities.md)
133
134
## Types
135
136
Core types and classes used throughout the SHAP library:
137
138
```python { .api }
139
class Explanation:
140
"""Container for SHAP values with rich metadata and operations."""
141
def __init__(self, values, base_values=None, data=None, display_data=None,
142
instance_names=None, feature_names=None, output_names=None,
143
output_indexes=None, lower_bounds=None, upper_bounds=None,
144
error_std=None, main_effects=None, hierarchical_values=None,
145
clustering=None, compute_time=None): ...
146
147
# Core properties
148
values: np.ndarray # SHAP attribution values
149
base_values: np.ndarray # Model baseline values
150
data: np.ndarray # Original input data
151
feature_names: list[str] # Feature names
152
output_names: list[str] # Output names
153
154
# Analysis methods
155
def mean(self, axis=None) -> 'Explanation': ...
156
def max(self, axis=None) -> 'Explanation': ...
157
def sum(self, axis=None, grouping=None) -> 'Explanation': ...
158
def sample(self, max_samples, replace=False, random_state=0) -> 'Explanation': ...
159
def hclust(self, metric="sqeuclidean", axis=0): ...
160
def cohorts(self, cohorts) -> 'Cohorts': ...
161
162
class Cohorts:
163
"""Manages multiple explanation cohorts for comparative analysis."""
164
def __init__(self, explanations, cohort_labels=None, cohort_names=None): ...
165
```