tessl/pypi-lightgbm

LightGBM is a gradient boosting framework that uses tree-based learning algorithms, designed to be distributed and efficient with faster training speed, higher efficiency, lower memory usage, better accuracy, and support for parallel, distributed, and GPU learning.

—

Pending

Overview

Eval results

Files

Visualization

Name: tessl/pypi-lightgbm
Author: tessl

Built-in plotting functions for model interpretation, feature importance analysis, training progress monitoring, and tree structure visualization. LightGBM's visualization capabilities support both matplotlib and graphviz backends for comprehensive model analysis and presentation.

Capabilities

Feature Importance Plotting

Visualize feature importance scores to understand which features contribute most to model predictions.

def plot_importance(booster, ax=None, height=0.2, xlim=None, ylim=None,
                    title='Feature importance', xlabel='Feature importance',
                    ylabel='Features', importance_type='auto', max_num_features=None,
                    ignore_zero=True, figsize=None, dpi=None, grid=True,
                    precision=3, **kwargs):
    """
    Plot model's feature importance scores.
    
    Parameters:
    - booster: Booster or LGBMModel - Trained model to analyze
    - ax: matplotlib.axes.Axes or None - Matplotlib axes object to plot on
    - height: float - Bar chart height (spacing between bars)
    - xlim: tuple or None - X-axis limits (min, max)
    - ylim: tuple or None - Y-axis limits (min, max)
    - title: str or None - Plot title
    - xlabel: str or None - X-axis label
    - ylabel: str or None - Y-axis label
    - importance_type: str - Type of importance ('auto', 'split', 'gain')
    - max_num_features: int or None - Maximum number of features to display
    - ignore_zero: bool - Whether to ignore features with zero importance
    - figsize: tuple or None - Figure size (width, height) in inches
    - dpi: int or None - Figure resolution in dots per inch
    - grid: bool - Whether to show grid lines
    - precision: int - Number of decimal places for importance values
    - **kwargs: Additional matplotlib bar plot parameters
    
    Returns:
    - matplotlib.axes.Axes: The matplotlib axes object with the plot
    """

Training Metrics Plotting

Plot training and validation metrics over iterations to monitor model performance and detect overfitting.

def plot_metric(eval_result, metric=None, ax=None, xlim=None, ylim=None,
                title='Metric during training', xlabel='Iterations',
                ylabel='auto', figsize=None, dpi=None, grid=True, **kwargs):
    """
    Plot one or several metric curves from training history.
    
    Parameters:
    - eval_result: dict - Evaluation results from training (from record_evaluation callback)
    - metric: str or None - Specific metric to plot (if None, plots all metrics)
    - ax: matplotlib.axes.Axes or None - Matplotlib axes object to plot on
    - xlim: tuple or None - X-axis limits (min, max)
    - ylim: tuple or None - Y-axis limits (min, max)
    - title: str or None - Plot title
    - xlabel: str or None - X-axis label
    - ylabel: str or 'auto' - Y-axis label ('auto' uses metric name)
    - figsize: tuple or None - Figure size (width, height) in inches
    - dpi: int or None - Figure resolution in dots per inch
    - grid: bool - Whether to show grid lines
    - **kwargs: Additional matplotlib plot parameters
    
    Returns:
    - matplotlib.axes.Axes: The matplotlib axes object with the plot
    """

Tree Structure Visualization

Visualize individual decision trees to understand model decision-making process.

def plot_tree(booster, ax=None, tree_index=0, figsize=None, dpi=None,
              show_info=None, precision=3, orientation='horizontal',
              **kwargs):
    """
    Plot specified tree structure.
    
    Parameters:
    - booster: Booster or LGBMModel - Trained model containing trees
    - ax: matplotlib.axes.Axes or None - Matplotlib axes object to plot on
    - tree_index: int - Index of tree to visualize
    - figsize: tuple or None - Figure size (width, height) in inches
    - dpi: int or None - Figure resolution in dots per inch
    - show_info: list or None - Information to show in nodes (['split_gain', 'leaf_count', etc.])
    - precision: int - Number of decimal places for node values
    - orientation: str - Tree layout ('horizontal' or 'vertical')
    - **kwargs: Additional matplotlib plotting parameters
    
    Returns:
    - matplotlib.axes.Axes: The matplotlib axes object with the tree plot
    """

def create_tree_digraph(booster, tree_index=0, show_info=None, precision=3,
                        orientation='horizontal', **kwargs):
    """
    Create graphviz digraph representation of specified tree.
    
    Parameters:
    - booster: Booster or LGBMModel - Trained model containing trees  
    - tree_index: int - Index of tree to visualize
    - show_info: list or None - Information to show in nodes
    - precision: int - Number of decimal places for node values
    - orientation: str - Tree layout direction
    - **kwargs: Additional graphviz parameters
    
    Returns:
    - graphviz.Digraph: Graphviz digraph object for the tree
    """

Split Value Analysis

Analyze the distribution of split values for specific features to understand feature usage patterns.

def plot_split_value_histogram(booster, feature, ax=None, bins=None,
                               color='auto', title='auto', xlabel='auto',
                               ylabel='Count', figsize=None, dpi=None,
                               grid=True, **kwargs):
    """
    Plot histogram of split values for specified feature.
    
    Parameters:
    - booster: Booster or LGBMModel - Trained model to analyze
    - feature: int or str - Feature index or name to analyze
    - ax: matplotlib.axes.Axes or None - Matplotlib axes object to plot on
    - bins: int or None - Number of histogram bins (auto-determined if None)
    - color: str or 'auto' - Histogram bar color
    - title: str or 'auto' - Plot title ('auto' generates descriptive title)
    - xlabel: str or 'auto' - X-axis label ('auto' uses feature name)
    - ylabel: str - Y-axis label
    - figsize: tuple or None - Figure size (width, height) in inches
    - dpi: int or None - Figure resolution in dots per inch
    - grid: bool - Whether to show grid lines
    - **kwargs: Additional matplotlib histogram parameters
    
    Returns:
    - matplotlib.axes.Axes: The matplotlib axes object with the histogram
    """

Usage Examples

Feature Importance Visualization

import lightgbm as lgb
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Load and prepare data
X, y = load_breast_cancer(return_X_y=True)
feature_names = load_breast_cancer().feature_names
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = lgb.LGBMClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train, feature_name=list(feature_names))

# Plot feature importance
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Plot split-based importance
lgb.plot_importance(
    model, 
    importance_type='split',
    ax=ax1,
    max_num_features=15,
    title='Feature Importance (Split-based)',
    xlabel='Number of splits',
    height=0.4
)

# Plot gain-based importance  
lgb.plot_importance(
    model,
    importance_type='gain', 
    ax=ax2,
    max_num_features=15,
    title='Feature Importance (Gain-based)',
    xlabel='Total gain',
    height=0.4
)

plt.tight_layout()
plt.show()

Training Progress Monitoring

import lightgbm as lgb
import matplotlib.pyplot as plt
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split

# Load data
X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Prepare datasets
train_data = lgb.Dataset(X_train, label=y_train)
test_data = lgb.Dataset(X_test, label=y_test, reference=train_data)

# Set up evaluation result recording
eval_result = {}

# Train model with evaluation tracking
model = lgb.train(
    {
        'objective': 'regression',
        'metric': ['rmse', 'mae'],
        'boosting_type': 'gbdt',
        'num_leaves': 31,
        'learning_rate': 0.05,
        'verbose': -1
    },
    train_data,
    num_boost_round=200,
    valid_sets=[train_data, test_data],
    valid_names=['train', 'test'],
    callbacks=[
        lgb.record_evaluation(eval_result),
        lgb.early_stopping(20)
    ]
)

# Plot training curves
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))

# Plot RMSE
lgb.plot_metric(
    eval_result,
    metric='rmse',
    ax=ax1,
    title='RMSE during Training',
    ylabel='RMSE'
)

# Plot MAE
lgb.plot_metric(
    eval_result,
    metric='mae', 
    ax=ax2,
    title='MAE during Training',
    ylabel='MAE'
)

plt.tight_layout()
plt.show()

# Print best scores
print(f"Best iteration: {model.best_iteration}")
print(f"Best RMSE: {eval_result['test']['rmse'][model.best_iteration-1]:.4f}")
print(f"Best MAE: {eval_result['test']['mae'][model.best_iteration-1]:.4f}")

Tree Structure Visualization

import lightgbm as lgb
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

# Load simple dataset for clear tree visualization
X, y = load_iris(return_X_y=True)
feature_names = load_iris().feature_names

# Train small model for interpretable trees
model = lgb.LGBMClassifier(
    n_estimators=3,
    max_depth=3, 
    num_leaves=7,
    random_state=42
)
model.fit(X, y, feature_name=list(feature_names))

# Visualize first few trees
fig, axes = plt.subplots(1, 3, figsize=(20, 6))

for i in range(3):
    lgb.plot_tree(
        model,
        tree_index=i,
        ax=axes[i],
        figsize=(6, 6),
        show_info=['split_gain', 'leaf_value'],
        precision=2
    )
    axes[i].set_title(f'Tree {i}')

plt.tight_layout()
plt.show()

# Alternative: Create graphviz digraph for higher quality
try:
    import graphviz
    
    # Create digraph for first tree
    graph = lgb.create_tree_digraph(
        model,
        tree_index=0,
        show_info=['split_gain', 'leaf_value', 'leaf_count'],
        precision=2
    )
    
    # Render to file
    graph.render('tree_0', format='png', cleanup=True)
    print("Tree digraph saved as tree_0.png")
    
except ImportError:
    print("Graphviz not available. Install with: pip install graphviz")

Split Value Analysis

import lightgbm as lgb
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import make_regression

# Generate data with known relationships
X, y = make_regression(n_samples=10000, n_features=10, noise=0.1, random_state=42)
feature_names = [f'feature_{i}' for i in range(X.shape[1])]

# Train model
model = lgb.LGBMRegressor(
    n_estimators=100,
    max_depth=6,
    random_state=42
)
model.fit(X, y, feature_name=feature_names)

# Analyze split values for top features
top_features = np.argsort(model.feature_importances_)[-4:]  # Top 4 features

fig, axes = plt.subplots(2, 2, figsize=(12, 8))
axes = axes.ravel()

for i, feature_idx in enumerate(top_features):
    lgb.plot_split_value_histogram(
        model,
        feature=feature_idx,
        ax=axes[i],
        bins=30,
        color='skyblue',
        alpha=0.7
    )
    
    # Add feature importance to title
    importance = model.feature_importances_[feature_idx]
    axes[i].set_title(f'{feature_names[feature_idx]} (Importance: {importance:.0f})')

plt.tight_layout()
plt.show()

# Print split statistics
for feature_idx in top_features:
    hist = model.booster_.get_split_value_histogram(feature_idx)
    print(f"\n{feature_names[feature_idx]}:")
    print(f"  Number of splits: {len(hist[1])}")
    print(f"  Split range: [{hist[0][0]:.3f}, {hist[0][-1]:.3f}]")
    print(f"  Most frequent split: {hist[0][np.argmax(hist[1])]:.3f}")

Comprehensive Model Analysis Dashboard

import lightgbm as lgb
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

# Load data
X, y = load_boston(return_X_y=True)
feature_names = load_boston().feature_names
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model with evaluation tracking
train_data = lgb.Dataset(X_train, label=y_train, feature_name=list(feature_names))
test_data = lgb.Dataset(X_test, label=y_test, reference=train_data)

eval_result = {}
model = lgb.train(
    {
        'objective': 'regression',
        'metric': ['rmse', 'mae'],
        'num_leaves': 31,
        'learning_rate': 0.05,
        'verbose': -1
    },
    train_data,
    num_boost_round=150,
    valid_sets=[train_data, test_data],
    valid_names=['train', 'test'],
    callbacks=[lgb.record_evaluation(eval_result)]
)

# Create comprehensive dashboard
fig = plt.figure(figsize=(20, 12))

# 1. Feature importance
ax1 = plt.subplot(2, 3, 1)
lgb.plot_importance(model, ax=ax1, max_num_features=10, importance_type='gain')
ax1.set_title('Feature Importance (Gain)')

# 2. Training curves
ax2 = plt.subplot(2, 3, 2)
lgb.plot_metric(eval_result, metric='rmse', ax=ax2)
ax2.set_title('RMSE During Training')

# 3. Tree structure (first tree)
ax3 = plt.subplot(2, 3, 3)
lgb.plot_tree(model, tree_index=0, ax=ax3, show_info=['split_gain'])
ax3.set_title('First Tree Structure')

# 4. Split histogram for most important feature
top_feature = np.argsort(model.feature_importance())[-1]
ax4 = plt.subplot(2, 3, 4)
lgb.plot_split_value_histogram(model, feature=top_feature, ax=ax4)
ax4.set_title(f'Split Values: {feature_names[top_feature]}')

# 5. Model predictions vs actual
ax5 = plt.subplot(2, 3, 5)
predictions = model.predict(X_test)
ax5.scatter(y_test, predictions, alpha=0.6)
ax5.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
ax5.set_xlabel('Actual Values')
ax5.set_ylabel('Predicted Values')
ax5.set_title('Predictions vs Actual')

# 6. Residuals plot
ax6 = plt.subplot(2, 3, 6)
residuals = predictions - y_test
ax6.scatter(predictions, residuals, alpha=0.6)
ax6.axhline(y=0, color='r', linestyle='--')
ax6.set_xlabel('Predicted Values')
ax6.set_ylabel('Residuals')
ax6.set_title('Residual Plot')

plt.tight_layout()
plt.show()

# Print model summary
print(f"Model Performance:")
print(f"Best iteration: {model.best_iteration}")
print(f"Test RMSE: {eval_result['test']['rmse'][-1]:.4f}")
print(f"Test MAE: {eval_result['test']['mae'][-1]:.4f}")
print(f"Number of trees: {model.num_trees()}")
print(f"Number of features: {model.num_feature()}")

Customization Options

Matplotlib Styling

All plotting functions accept standard matplotlib parameters for customization:

# Custom styling example
lgb.plot_importance(
    model,
    figsize=(10, 8),
    color='darkblue',
    alpha=0.8,
    edgecolor='black',
    linewidth=1.5,
    grid=True,
    title='Custom Styled Feature Importance',
    xlabel='Importance Score',
    fontsize=12
)

# Apply matplotlib style
plt.style.use('seaborn-v0_8')  # Or any other style
lgb.plot_metric(eval_result, metric='rmse')

Saving Plots

# Save plot to file
ax = lgb.plot_importance(model, figsize=(10, 6))
ax.figure.savefig('feature_importance.png', dpi=300, bbox_inches='tight')

# Save multiple plots
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
lgb.plot_importance(model, ax=axes[0,0])
lgb.plot_metric(eval_result, ax=axes[0,1])
# ... add more plots
plt.savefig('model_analysis.pdf', bbox_inches='tight')

Install with Tessl CLI