0
# Plotting and Visualization
1
2
Comprehensive plotting tools for training curves, evaluation results, and performance analysis. Provides functions for creating publication-quality plots from training logs, comparing algorithms, and visualizing learning progress.
3
4
## Core Imports
5
6
```python
7
from rl_zoo3.plots import plot_train, plot_from_file, all_plots
8
from rl_zoo3.plots.score_normalization import normalize_score
9
from rl_zoo3.plots.plot_from_file import restyle_boxplot
10
import numpy as np
11
```
12
13
## Capabilities
14
15
### Training Curve Plotting
16
17
Plot training progress curves from Tensorboard logs and training data.
18
19
```python { .api }
20
def plot_train() -> None:
21
"""
22
Plot training curves from monitor logs.
23
24
Command-line interface for plotting training progress including:
25
- Episode rewards over time
26
- Episode lengths
27
- Success rates (if applicable)
28
- Learning curves with rolling window smoothing
29
30
Reads from monitor log files and generates matplotlib plots.
31
Supports customization of x-axis (steps/episodes/time), y-axis metrics,
32
figure size, fonts, and rolling window size.
33
"""
34
```
35
36
Usage example:
37
```bash
38
# Command line usage
39
rl_zoo3 plot_train --log-dir ./logs --env CartPole-v1 --algo ppo
40
41
# Or programmatically
42
from rl_zoo3.plots import plot_train
43
import sys
44
45
# Set command line arguments
46
sys.argv = [
47
'plot_train',
48
'--log-dir', './logs',
49
'--env', 'CartPole-v1',
50
'--algo', 'ppo',
51
'--smooth', '10'
52
]
53
54
plot_train()
55
```
56
57
### File-Based Plotting
58
59
Create plots from saved log files and evaluation results.
60
61
```python { .api }
62
def plot_from_file() -> None:
63
"""
64
Plot results from saved evaluation files.
65
66
Command-line interface for creating plots from evaluation results stored in:
67
- Numpy archive files (.npz)
68
- Pickle files with evaluation data
69
- Post-processed experimental results
70
71
Supports advanced statistical visualization including:
72
- Box plots for performance distributions
73
- Learning curves with confidence intervals
74
- Algorithm comparison plots
75
- Publication-quality figures with customizable styling
76
"""
77
```
78
79
Usage example:
80
```bash
81
# Command line usage
82
rl_zoo3 plot_from_file --log-dir ./eval_logs --output ./plots
83
84
# Programmatic usage
85
from rl_zoo3.plots import plot_from_file
86
import sys
87
88
sys.argv = [
89
'plot_from_file',
90
'--log-dir', './eval_logs',
91
'--output-dir', './plots',
92
'--format', 'png'
93
]
94
95
plot_from_file()
96
```
97
98
### Comprehensive Plotting
99
100
Generate all available plots for a complete analysis.
101
102
```python { .api }
103
def all_plots() -> None:
104
"""
105
Generate comprehensive analysis plots from experimental results.
106
107
Command-line interface that creates:
108
- Algorithm comparison plots across environments
109
- Statistical performance summaries
110
- Learning curves with confidence intervals
111
- Experiment matrices and correlation analysis
112
- Publication-ready figures and tables
113
114
Processes experimental results from multiple algorithms and environments
115
to create a complete analysis suite for research papers and reports.
116
"""
117
```
118
119
Usage example:
120
```bash
121
# Generate all plots
122
rl_zoo3 all_plots --log-dir ./logs --output-dir ./plots --env CartPole-v1
123
124
# Programmatic usage
125
from rl_zoo3.plots.all_plots import all_plots
126
import sys
127
128
sys.argv = [
129
'all_plots',
130
'--log-dir', './logs',
131
'--output-dir', './analysis_plots',
132
'--env', 'CartPole-v1',
133
'--algo', 'ppo'
134
]
135
136
all_plots()
137
```
138
139
### Score Normalization
140
141
Normalize performance scores across different environments for fair comparison.
142
143
```python { .api }
144
def normalize_score(score: np.ndarray, env_id: str) -> np.ndarray:
145
"""
146
Normalize scores for cross-environment comparison.
147
148
Parameters:
149
- score: Array of raw scores/rewards
150
- env_id: Environment identifier for normalization reference
151
152
Returns:
153
np.ndarray: Normalized scores (typically 0-100 scale)
154
155
Uses environment-specific reference scores to normalize performance,
156
enabling fair comparison across different environments with varying
157
reward scales and difficulty levels.
158
"""
159
```
160
161
```python { .api }
162
class ReferenceScore(NamedTuple):
163
"""
164
Reference score data structure for normalization.
165
166
Attributes:
167
- env_id: Environment identifier
168
- min_score: Minimum reference score (random policy)
169
- max_score: Maximum reference score (expert/optimal policy)
170
"""
171
env_id: str
172
min_score: float
173
max_score: float
174
```
175
176
Usage example:
177
```python
178
import numpy as np
179
from rl_zoo3.plots.score_normalization import normalize_score
180
181
# Raw scores from different environments
182
cartpole_scores = np.array([180, 200, 195, 210, 175])
183
pendulum_scores = np.array([-150, -120, -130, -110, -140])
184
185
# Normalize for comparison
186
cartpole_normalized = normalize_score(cartpole_scores, "CartPole-v1")
187
pendulum_normalized = normalize_score(pendulum_scores, "Pendulum-v1")
188
189
print("CartPole normalized:", cartpole_normalized)
190
print("Pendulum normalized:", pendulum_normalized)
191
192
# Now scores are comparable across environments
193
average_performance = (cartpole_normalized.mean() + pendulum_normalized.mean()) / 2
194
print(f"Average normalized performance: {average_performance:.2f}")
195
```
196
197
### Utility Functions
198
199
Helper functions for plot styling and data processing.
200
201
```python { .api }
202
def restyle_boxplot(
203
artist_dict: dict,
204
color: str,
205
gray: str = "#222222",
206
linewidth: int = 1,
207
fliersize: int = 5
208
) -> None:
209
"""
210
Restyle boxplot appearance for publication quality.
211
212
Parameters:
213
- artist_dict: Dictionary of boxplot artists from matplotlib
214
- color: Primary color for the boxplot
215
- gray: Color for secondary elements (lines, whiskers, etc.)
216
- linewidth: Width of plot lines
217
- fliersize: Size of outlier markers
218
219
Modifies boxplot styling in-place for consistent, professional appearance
220
across all plots generated by RL Zoo3.
221
"""
222
```
223
224
## Advanced Plotting Examples
225
226
### Multi-Algorithm Comparison
227
228
```python
229
import matplotlib.pyplot as plt
230
import numpy as np
231
from rl_zoo3.plots.score_normalization import normalize_score
232
233
# Load results from multiple algorithms
234
algorithms = ['ppo', 'sac', 'td3', 'dqn']
235
env_id = "HalfCheetah-v4"
236
237
# Simulate loading results (replace with actual data loading)
238
results = {
239
'ppo': np.random.normal(3000, 500, 10),
240
'sac': np.random.normal(3500, 400, 10),
241
'td3': np.random.normal(3200, 600, 10),
242
'dqn': np.random.normal(2800, 700, 10)
243
}
244
245
# Normalize scores for fair comparison
246
normalized_results = {}
247
for algo, scores in results.items():
248
normalized_results[algo] = normalize_score(scores, env_id)
249
250
# Create comparison plot
251
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
252
253
# Raw scores
254
ax1.boxplot([results[algo] for algo in algorithms], labels=algorithms)
255
ax1.set_title("Raw Scores")
256
ax1.set_ylabel("Episode Return")
257
258
# Normalized scores
259
ax2.boxplot([normalized_results[algo] for algo in algorithms], labels=algorithms)
260
ax2.set_title("Normalized Scores")
261
ax2.set_ylabel("Normalized Performance (0-100)")
262
263
plt.tight_layout()
264
plt.savefig("algorithm_comparison.png", dpi=300, bbox_inches='tight')
265
plt.show()
266
```
267
268
### Training Progress Analysis
269
270
```python
271
import pandas as pd
272
import matplotlib.pyplot as plt
273
import seaborn as sns
274
from pathlib import Path
275
276
def analyze_training_progress(log_dir: str, env_id: str, algo: str):
277
"""
278
Analyze and plot training progress from log files.
279
"""
280
log_path = Path(log_dir) / algo / env_id
281
282
# Load training data (example structure)
283
# In practice, you'd load from actual log files
284
timesteps = np.arange(0, 100000, 1000)
285
episode_rewards = np.random.normal(150, 30, len(timesteps)) + \
286
50 * np.log(timesteps + 1) / np.log(10) # Simulated learning
287
288
# Add noise and occasional drops (realistic training curves)
289
episode_rewards += np.random.normal(0, 10, len(timesteps))
290
291
# Create comprehensive training plot
292
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
293
294
# Learning curve
295
axes[0, 0].plot(timesteps, episode_rewards, alpha=0.7, label='Episode Reward')
296
297
# Add smoothed curve
298
window = 10
299
smoothed = pd.Series(episode_rewards).rolling(window).mean()
300
axes[0, 0].plot(timesteps, smoothed, color='red', linewidth=2, label=f'Smoothed ({window})')
301
302
axes[0, 0].set_xlabel('Timesteps')
303
axes[0, 0].set_ylabel('Episode Reward')
304
axes[0, 0].set_title(f'{algo.upper()} Learning Curve - {env_id}')
305
axes[0, 0].legend()
306
axes[0, 0].grid(True, alpha=0.3)
307
308
# Reward distribution over time
309
# Split into early, middle, late training
310
early = episode_rewards[:len(episode_rewards)//3]
311
middle = episode_rewards[len(episode_rewards)//3:2*len(episode_rewards)//3]
312
late = episode_rewards[2*len(episode_rewards)//3:]
313
314
axes[0, 1].boxplot([early, middle, late], labels=['Early', 'Middle', 'Late'])
315
axes[0, 1].set_title('Reward Distribution by Training Phase')
316
axes[0, 1].set_ylabel('Episode Reward')
317
318
# Improvement rate
319
improvement = np.gradient(smoothed.dropna())
320
axes[1, 0].plot(timesteps[window-1:], improvement, alpha=0.7)
321
axes[1, 0].axhline(y=0, color='r', linestyle='--', alpha=0.5)
322
axes[1, 0].set_xlabel('Timesteps')
323
axes[1, 0].set_ylabel('Improvement Rate')
324
axes[1, 0].set_title('Learning Rate Over Time')
325
axes[1, 0].grid(True, alpha=0.3)
326
327
# Final performance histogram
328
final_episodes = episode_rewards[-20:] # Last 20 episodes
329
axes[1, 1].hist(final_episodes, bins=10, alpha=0.7, edgecolor='black')
330
axes[1, 1].axvline(final_episodes.mean(), color='red', linestyle='--',
331
label=f'Mean: {final_episodes.mean():.1f}')
332
axes[1, 1].set_xlabel('Episode Reward')
333
axes[1, 1].set_ylabel('Frequency')
334
axes[1, 1].set_title('Final Performance Distribution')
335
axes[1, 1].legend()
336
337
plt.tight_layout()
338
plt.savefig(f"{algo}_{env_id}_analysis.png", dpi=300, bbox_inches='tight')
339
plt.show()
340
341
# Use the analysis function
342
analyze_training_progress("./logs", "CartPole-v1", "ppo")
343
```
344
345
### Hyperparameter Sensitivity Analysis
346
347
```python
348
import matplotlib.pyplot as plt
349
import numpy as np
350
from itertools import product
351
352
def plot_hyperparameter_sensitivity():
353
"""
354
Plot how performance varies with different hyperparameters.
355
"""
356
# Example: PPO learning rate vs clip range sensitivity
357
learning_rates = [1e-4, 3e-4, 1e-3, 3e-3]
358
clip_ranges = [0.1, 0.2, 0.3, 0.4]
359
360
# Simulate performance data (replace with actual results)
361
performance_matrix = np.random.normal(180, 20, (len(learning_rates), len(clip_ranges)))
362
363
# Add realistic patterns - lower LR generally more stable
364
for i, lr in enumerate(learning_rates):
365
for j, clip in enumerate(clip_ranges):
366
# Simulate that moderate values work better
367
lr_penalty = abs(np.log10(lr) + 3.5) * 10 # Penalty for extreme LR
368
clip_penalty = abs(clip - 0.2) * 50 # Penalty for extreme clip range
369
performance_matrix[i, j] -= (lr_penalty + clip_penalty)
370
371
# Create heatmap
372
fig, ax = plt.subplots(figsize=(10, 8))
373
374
im = ax.imshow(performance_matrix, cmap='viridis', aspect='auto')
375
376
# Set ticks and labels
377
ax.set_xticks(range(len(clip_ranges)))
378
ax.set_yticks(range(len(learning_rates)))
379
ax.set_xticklabels([f"{cr:.1f}" for cr in clip_ranges])
380
ax.set_yticklabels([f"{lr:.0e}" for lr in learning_rates])
381
382
ax.set_xlabel('Clip Range')
383
ax.set_ylabel('Learning Rate')
384
ax.set_title('PPO Hyperparameter Sensitivity\n(CartPole-v1 Performance)')
385
386
# Add colorbar
387
cbar = plt.colorbar(im, ax=ax)
388
cbar.set_label('Average Episode Reward')
389
390
# Add text annotations
391
for i in range(len(learning_rates)):
392
for j in range(len(clip_ranges)):
393
text = ax.text(j, i, f'{performance_matrix[i, j]:.0f}',
394
ha="center", va="center", color="white", fontweight='bold')
395
396
plt.tight_layout()
397
plt.savefig("hyperparameter_sensitivity.png", dpi=300, bbox_inches='tight')
398
plt.show()
399
400
plot_hyperparameter_sensitivity()
401
```
402
403
## Integration with Command Line Tools
404
405
All plotting functions are available through the RL Zoo3 command line interface:
406
407
```bash
408
# Plot training curves
409
rl_zoo3 plot_train --log-dir ./logs --env CartPole-v1 --algo ppo --smooth 10
410
411
# Plot from evaluation files
412
rl_zoo3 plot_from_file --log-dir ./eval_results --output-dir ./plots
413
414
# Generate all plots
415
rl_zoo3 all_plots --log-dir ./logs --output-dir ./analysis --env CartPole-v1
416
417
# With additional options
418
rl_zoo3 plot_train \
419
--log-dir ./logs \
420
--env CartPole-v1 \
421
--algo ppo \
422
--smooth 10 \
423
--window 50 \
424
--format png \
425
--dpi 300
426
```
427
428
The plotting system integrates seamlessly with the RL Zoo3 training workflow, automatically generating visualizations from standard log formats and providing comprehensive analysis tools for RL experiments.