tessl/pypi-rl-zoo3

A Training Framework for Stable Baselines3 Reinforcement Learning Agents

—

Pending

Overview

Eval results

Files

Plotting and Visualization

Name: tessl/pypi-rl-zoo3
Author: tessl

Comprehensive plotting tools for training curves, evaluation results, and performance analysis. Provides functions for creating publication-quality plots from training logs, comparing algorithms, and visualizing learning progress.

Core Imports

from rl_zoo3.plots import plot_train, plot_from_file, all_plots
from rl_zoo3.plots.score_normalization import normalize_score
from rl_zoo3.plots.plot_from_file import restyle_boxplot
import numpy as np

Capabilities

Training Curve Plotting

Plot training progress curves from Tensorboard logs and training data.

def plot_train() -> None:
    """
    Plot training curves from monitor logs.
    
    Command-line interface for plotting training progress including:
    - Episode rewards over time  
    - Episode lengths
    - Success rates (if applicable)
    - Learning curves with rolling window smoothing
    
    Reads from monitor log files and generates matplotlib plots.
    Supports customization of x-axis (steps/episodes/time), y-axis metrics,
    figure size, fonts, and rolling window size.
    """

Usage example:

# Command line usage
rl_zoo3 plot_train --log-dir ./logs --env CartPole-v1 --algo ppo

# Or programmatically
from rl_zoo3.plots import plot_train
import sys

# Set command line arguments
sys.argv = [
    'plot_train',
    '--log-dir', './logs',
    '--env', 'CartPole-v1',
    '--algo', 'ppo',
    '--smooth', '10'
]

plot_train()

File-Based Plotting

Create plots from saved log files and evaluation results.

def plot_from_file() -> None:
    """
    Plot results from saved evaluation files.
    
    Command-line interface for creating plots from evaluation results stored in:
    - Numpy archive files (.npz)
    - Pickle files with evaluation data
    - Post-processed experimental results
    
    Supports advanced statistical visualization including:
    - Box plots for performance distributions
    - Learning curves with confidence intervals
    - Algorithm comparison plots
    - Publication-quality figures with customizable styling
    """

Usage example:

# Command line usage
rl_zoo3 plot_from_file --log-dir ./eval_logs --output ./plots

# Programmatic usage
from rl_zoo3.plots import plot_from_file
import sys

sys.argv = [
    'plot_from_file',
    '--log-dir', './eval_logs',
    '--output-dir', './plots',
    '--format', 'png'
]

plot_from_file()

Comprehensive Plotting

Generate all available plots for a complete analysis.

def all_plots() -> None:
    """
    Generate comprehensive analysis plots from experimental results.
    
    Command-line interface that creates:
    - Algorithm comparison plots across environments
    - Statistical performance summaries
    - Learning curves with confidence intervals
    - Experiment matrices and correlation analysis
    - Publication-ready figures and tables
    
    Processes experimental results from multiple algorithms and environments
    to create a complete analysis suite for research papers and reports.
    """

Usage example:

# Generate all plots
rl_zoo3 all_plots --log-dir ./logs --output-dir ./plots --env CartPole-v1

# Programmatic usage
from rl_zoo3.plots.all_plots import all_plots
import sys

sys.argv = [
    'all_plots',
    '--log-dir', './logs',
    '--output-dir', './analysis_plots',
    '--env', 'CartPole-v1',
    '--algo', 'ppo'
]

all_plots()

Score Normalization

Normalize performance scores across different environments for fair comparison.

def normalize_score(score: np.ndarray, env_id: str) -> np.ndarray:
    """
    Normalize scores for cross-environment comparison.
    
    Parameters:
    - score: Array of raw scores/rewards
    - env_id: Environment identifier for normalization reference
    
    Returns:
    np.ndarray: Normalized scores (typically 0-100 scale)
    
    Uses environment-specific reference scores to normalize performance,
    enabling fair comparison across different environments with varying
    reward scales and difficulty levels.
    """

class ReferenceScore(NamedTuple):
    """
    Reference score data structure for normalization.
    
    Attributes:
    - env_id: Environment identifier
    - min_score: Minimum reference score (random policy)
    - max_score: Maximum reference score (expert/optimal policy)
    """
    env_id: str
    min_score: float
    max_score: float

Usage example:

import numpy as np
from rl_zoo3.plots.score_normalization import normalize_score

# Raw scores from different environments
cartpole_scores = np.array([180, 200, 195, 210, 175])
pendulum_scores = np.array([-150, -120, -130, -110, -140])

# Normalize for comparison
cartpole_normalized = normalize_score(cartpole_scores, "CartPole-v1")
pendulum_normalized = normalize_score(pendulum_scores, "Pendulum-v1")

print("CartPole normalized:", cartpole_normalized)
print("Pendulum normalized:", pendulum_normalized)

# Now scores are comparable across environments
average_performance = (cartpole_normalized.mean() + pendulum_normalized.mean()) / 2
print(f"Average normalized performance: {average_performance:.2f}")

Utility Functions

Helper functions for plot styling and data processing.

def restyle_boxplot(
    artist_dict: dict,
    color: str,
    gray: str = "#222222",
    linewidth: int = 1,
    fliersize: int = 5
) -> None:
    """
    Restyle boxplot appearance for publication quality.
    
    Parameters:
    - artist_dict: Dictionary of boxplot artists from matplotlib
    - color: Primary color for the boxplot
    - gray: Color for secondary elements (lines, whiskers, etc.)
    - linewidth: Width of plot lines
    - fliersize: Size of outlier markers
    
    Modifies boxplot styling in-place for consistent, professional appearance
    across all plots generated by RL Zoo3.
    """

Advanced Plotting Examples

Multi-Algorithm Comparison

import matplotlib.pyplot as plt
import numpy as np
from rl_zoo3.plots.score_normalization import normalize_score

# Load results from multiple algorithms
algorithms = ['ppo', 'sac', 'td3', 'dqn']
env_id = "HalfCheetah-v4"

# Simulate loading results (replace with actual data loading)
results = {
    'ppo': np.random.normal(3000, 500, 10),
    'sac': np.random.normal(3500, 400, 10), 
    'td3': np.random.normal(3200, 600, 10),
    'dqn': np.random.normal(2800, 700, 10)
}

# Normalize scores for fair comparison
normalized_results = {}
for algo, scores in results.items():
    normalized_results[algo] = normalize_score(scores, env_id)

# Create comparison plot
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# Raw scores
ax1.boxplot([results[algo] for algo in algorithms], labels=algorithms)
ax1.set_title("Raw Scores")
ax1.set_ylabel("Episode Return")

# Normalized scores
ax2.boxplot([normalized_results[algo] for algo in algorithms], labels=algorithms)
ax2.set_title("Normalized Scores")
ax2.set_ylabel("Normalized Performance (0-100)")

plt.tight_layout()
plt.savefig("algorithm_comparison.png", dpi=300, bbox_inches='tight')
plt.show()

Training Progress Analysis

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

def analyze_training_progress(log_dir: str, env_id: str, algo: str):
    """
    Analyze and plot training progress from log files.
    """
    log_path = Path(log_dir) / algo / env_id
    
    # Load training data (example structure)
    # In practice, you'd load from actual log files
    timesteps = np.arange(0, 100000, 1000)
    episode_rewards = np.random.normal(150, 30, len(timesteps)) + \
                     50 * np.log(timesteps + 1) / np.log(10)  # Simulated learning
    
    # Add noise and occasional drops (realistic training curves)
    episode_rewards += np.random.normal(0, 10, len(timesteps))
    
    # Create comprehensive training plot
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    
    # Learning curve
    axes[0, 0].plot(timesteps, episode_rewards, alpha=0.7, label='Episode Reward')
    
    # Add smoothed curve
    window = 10
    smoothed = pd.Series(episode_rewards).rolling(window).mean()
    axes[0, 0].plot(timesteps, smoothed, color='red', linewidth=2, label=f'Smoothed ({window})')
    
    axes[0, 0].set_xlabel('Timesteps')
    axes[0, 0].set_ylabel('Episode Reward')
    axes[0, 0].set_title(f'{algo.upper()} Learning Curve - {env_id}')
    axes[0, 0].legend()
    axes[0, 0].grid(True, alpha=0.3)
    
    # Reward distribution over time
    # Split into early, middle, late training
    early = episode_rewards[:len(episode_rewards)//3]
    middle = episode_rewards[len(episode_rewards)//3:2*len(episode_rewards)//3]
    late = episode_rewards[2*len(episode_rewards)//3:]
    
    axes[0, 1].boxplot([early, middle, late], labels=['Early', 'Middle', 'Late'])
    axes[0, 1].set_title('Reward Distribution by Training Phase')
    axes[0, 1].set_ylabel('Episode Reward')
    
    # Improvement rate
    improvement = np.gradient(smoothed.dropna())
    axes[1, 0].plot(timesteps[window-1:], improvement, alpha=0.7)
    axes[1, 0].axhline(y=0, color='r', linestyle='--', alpha=0.5)
    axes[1, 0].set_xlabel('Timesteps')
    axes[1, 0].set_ylabel('Improvement Rate')
    axes[1, 0].set_title('Learning Rate Over Time')
    axes[1, 0].grid(True, alpha=0.3)
    
    # Final performance histogram
    final_episodes = episode_rewards[-20:]  # Last 20 episodes
    axes[1, 1].hist(final_episodes, bins=10, alpha=0.7, edgecolor='black')
    axes[1, 1].axvline(final_episodes.mean(), color='red', linestyle='--', 
                      label=f'Mean: {final_episodes.mean():.1f}')
    axes[1, 1].set_xlabel('Episode Reward')
    axes[1, 1].set_ylabel('Frequency')
    axes[1, 1].set_title('Final Performance Distribution')
    axes[1, 1].legend()
    
    plt.tight_layout()
    plt.savefig(f"{algo}_{env_id}_analysis.png", dpi=300, bbox_inches='tight')
    plt.show()

# Use the analysis function
analyze_training_progress("./logs", "CartPole-v1", "ppo")

Hyperparameter Sensitivity Analysis

import matplotlib.pyplot as plt
import numpy as np
from itertools import product

def plot_hyperparameter_sensitivity():
    """
    Plot how performance varies with different hyperparameters.
    """
    # Example: PPO learning rate vs clip range sensitivity
    learning_rates = [1e-4, 3e-4, 1e-3, 3e-3]
    clip_ranges = [0.1, 0.2, 0.3, 0.4]
    
    # Simulate performance data (replace with actual results)
    performance_matrix = np.random.normal(180, 20, (len(learning_rates), len(clip_ranges)))
    
    # Add realistic patterns - lower LR generally more stable
    for i, lr in enumerate(learning_rates):
        for j, clip in enumerate(clip_ranges):
            # Simulate that moderate values work better
            lr_penalty = abs(np.log10(lr) + 3.5) * 10  # Penalty for extreme LR
            clip_penalty = abs(clip - 0.2) * 50  # Penalty for extreme clip range
            performance_matrix[i, j] -= (lr_penalty + clip_penalty)
    
    # Create heatmap
    fig, ax = plt.subplots(figsize=(10, 8))
    
    im = ax.imshow(performance_matrix, cmap='viridis', aspect='auto')
    
    # Set ticks and labels
    ax.set_xticks(range(len(clip_ranges)))
    ax.set_yticks(range(len(learning_rates)))
    ax.set_xticklabels([f"{cr:.1f}" for cr in clip_ranges])
    ax.set_yticklabels([f"{lr:.0e}" for lr in learning_rates])
    
    ax.set_xlabel('Clip Range')
    ax.set_ylabel('Learning Rate')
    ax.set_title('PPO Hyperparameter Sensitivity\n(CartPole-v1 Performance)')
    
    # Add colorbar
    cbar = plt.colorbar(im, ax=ax)
    cbar.set_label('Average Episode Reward')
    
    # Add text annotations
    for i in range(len(learning_rates)):
        for j in range(len(clip_ranges)):
            text = ax.text(j, i, f'{performance_matrix[i, j]:.0f}',
                          ha="center", va="center", color="white", fontweight='bold')
    
    plt.tight_layout()
    plt.savefig("hyperparameter_sensitivity.png", dpi=300, bbox_inches='tight')
    plt.show()

plot_hyperparameter_sensitivity()

Integration with Command Line Tools

All plotting functions are available through the RL Zoo3 command line interface:

# Plot training curves
rl_zoo3 plot_train --log-dir ./logs --env CartPole-v1 --algo ppo --smooth 10

# Plot from evaluation files
rl_zoo3 plot_from_file --log-dir ./eval_results --output-dir ./plots

# Generate all plots
rl_zoo3 all_plots --log-dir ./logs --output-dir ./analysis --env CartPole-v1

# With additional options
rl_zoo3 plot_train \
    --log-dir ./logs \
    --env CartPole-v1 \
    --algo ppo \
    --smooth 10 \
    --window 50 \
    --format png \
    --dpi 300

The plotting system integrates seamlessly with the RL Zoo3 training workflow, automatically generating visualizations from standard log formats and providing comprehensive analysis tools for RL experiments.

Install with Tessl CLI