or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

callbacks.mdcore-utilities.mdexperiment-management.mdhub-integration.mdhyperparameter-optimization.mdindex.mdplotting.mdwrappers.md

plotting.mddocs/

0

# Plotting and Visualization

1

2

Comprehensive plotting tools for training curves, evaluation results, and performance analysis. Provides functions for creating publication-quality plots from training logs, comparing algorithms, and visualizing learning progress.

3

4

## Core Imports

5

6

```python

7

from rl_zoo3.plots import plot_train, plot_from_file, all_plots

8

from rl_zoo3.plots.score_normalization import normalize_score

9

from rl_zoo3.plots.plot_from_file import restyle_boxplot

10

import numpy as np

11

```

12

13

## Capabilities

14

15

### Training Curve Plotting

16

17

Plot training progress curves from Tensorboard logs and training data.

18

19

```python { .api }

20

def plot_train() -> None:

21

"""

22

Plot training curves from monitor logs.

23

24

Command-line interface for plotting training progress including:

25

- Episode rewards over time

26

- Episode lengths

27

- Success rates (if applicable)

28

- Learning curves with rolling window smoothing

29

30

Reads from monitor log files and generates matplotlib plots.

31

Supports customization of x-axis (steps/episodes/time), y-axis metrics,

32

figure size, fonts, and rolling window size.

33

"""

34

```

35

36

Usage example:

37

```bash

38

# Command line usage

39

rl_zoo3 plot_train --log-dir ./logs --env CartPole-v1 --algo ppo

40

41

# Or programmatically

42

from rl_zoo3.plots import plot_train

43

import sys

44

45

# Set command line arguments

46

sys.argv = [

47

'plot_train',

48

'--log-dir', './logs',

49

'--env', 'CartPole-v1',

50

'--algo', 'ppo',

51

'--smooth', '10'

52

]

53

54

plot_train()

55

```

56

57

### File-Based Plotting

58

59

Create plots from saved log files and evaluation results.

60

61

```python { .api }

62

def plot_from_file() -> None:

63

"""

64

Plot results from saved evaluation files.

65

66

Command-line interface for creating plots from evaluation results stored in:

67

- Numpy archive files (.npz)

68

- Pickle files with evaluation data

69

- Post-processed experimental results

70

71

Supports advanced statistical visualization including:

72

- Box plots for performance distributions

73

- Learning curves with confidence intervals

74

- Algorithm comparison plots

75

- Publication-quality figures with customizable styling

76

"""

77

```

78

79

Usage example:

80

```bash

81

# Command line usage

82

rl_zoo3 plot_from_file --log-dir ./eval_logs --output ./plots

83

84

# Programmatic usage

85

from rl_zoo3.plots import plot_from_file

86

import sys

87

88

sys.argv = [

89

'plot_from_file',

90

'--log-dir', './eval_logs',

91

'--output-dir', './plots',

92

'--format', 'png'

93

]

94

95

plot_from_file()

96

```

97

98

### Comprehensive Plotting

99

100

Generate all available plots for a complete analysis.

101

102

```python { .api }

103

def all_plots() -> None:

104

"""

105

Generate comprehensive analysis plots from experimental results.

106

107

Command-line interface that creates:

108

- Algorithm comparison plots across environments

109

- Statistical performance summaries

110

- Learning curves with confidence intervals

111

- Experiment matrices and correlation analysis

112

- Publication-ready figures and tables

113

114

Processes experimental results from multiple algorithms and environments

115

to create a complete analysis suite for research papers and reports.

116

"""

117

```

118

119

Usage example:

120

```bash

121

# Generate all plots

122

rl_zoo3 all_plots --log-dir ./logs --output-dir ./plots --env CartPole-v1

123

124

# Programmatic usage

125

from rl_zoo3.plots.all_plots import all_plots

126

import sys

127

128

sys.argv = [

129

'all_plots',

130

'--log-dir', './logs',

131

'--output-dir', './analysis_plots',

132

'--env', 'CartPole-v1',

133

'--algo', 'ppo'

134

]

135

136

all_plots()

137

```

138

139

### Score Normalization

140

141

Normalize performance scores across different environments for fair comparison.

142

143

```python { .api }

144

def normalize_score(score: np.ndarray, env_id: str) -> np.ndarray:

145

"""

146

Normalize scores for cross-environment comparison.

147

148

Parameters:

149

- score: Array of raw scores/rewards

150

- env_id: Environment identifier for normalization reference

151

152

Returns:

153

np.ndarray: Normalized scores (typically 0-100 scale)

154

155

Uses environment-specific reference scores to normalize performance,

156

enabling fair comparison across different environments with varying

157

reward scales and difficulty levels.

158

"""

159

```

160

161

```python { .api }

162

class ReferenceScore(NamedTuple):

163

"""

164

Reference score data structure for normalization.

165

166

Attributes:

167

- env_id: Environment identifier

168

- min_score: Minimum reference score (random policy)

169

- max_score: Maximum reference score (expert/optimal policy)

170

"""

171

env_id: str

172

min_score: float

173

max_score: float

174

```

175

176

Usage example:

177

```python

178

import numpy as np

179

from rl_zoo3.plots.score_normalization import normalize_score

180

181

# Raw scores from different environments

182

cartpole_scores = np.array([180, 200, 195, 210, 175])

183

pendulum_scores = np.array([-150, -120, -130, -110, -140])

184

185

# Normalize for comparison

186

cartpole_normalized = normalize_score(cartpole_scores, "CartPole-v1")

187

pendulum_normalized = normalize_score(pendulum_scores, "Pendulum-v1")

188

189

print("CartPole normalized:", cartpole_normalized)

190

print("Pendulum normalized:", pendulum_normalized)

191

192

# Now scores are comparable across environments

193

average_performance = (cartpole_normalized.mean() + pendulum_normalized.mean()) / 2

194

print(f"Average normalized performance: {average_performance:.2f}")

195

```

196

197

### Utility Functions

198

199

Helper functions for plot styling and data processing.

200

201

```python { .api }

202

def restyle_boxplot(

203

artist_dict: dict,

204

color: str,

205

gray: str = "#222222",

206

linewidth: int = 1,

207

fliersize: int = 5

208

) -> None:

209

"""

210

Restyle boxplot appearance for publication quality.

211

212

Parameters:

213

- artist_dict: Dictionary of boxplot artists from matplotlib

214

- color: Primary color for the boxplot

215

- gray: Color for secondary elements (lines, whiskers, etc.)

216

- linewidth: Width of plot lines

217

- fliersize: Size of outlier markers

218

219

Modifies boxplot styling in-place for consistent, professional appearance

220

across all plots generated by RL Zoo3.

221

"""

222

```

223

224

## Advanced Plotting Examples

225

226

### Multi-Algorithm Comparison

227

228

```python

229

import matplotlib.pyplot as plt

230

import numpy as np

231

from rl_zoo3.plots.score_normalization import normalize_score

232

233

# Load results from multiple algorithms

234

algorithms = ['ppo', 'sac', 'td3', 'dqn']

235

env_id = "HalfCheetah-v4"

236

237

# Simulate loading results (replace with actual data loading)

238

results = {

239

'ppo': np.random.normal(3000, 500, 10),

240

'sac': np.random.normal(3500, 400, 10),

241

'td3': np.random.normal(3200, 600, 10),

242

'dqn': np.random.normal(2800, 700, 10)

243

}

244

245

# Normalize scores for fair comparison

246

normalized_results = {}

247

for algo, scores in results.items():

248

normalized_results[algo] = normalize_score(scores, env_id)

249

250

# Create comparison plot

251

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

252

253

# Raw scores

254

ax1.boxplot([results[algo] for algo in algorithms], labels=algorithms)

255

ax1.set_title("Raw Scores")

256

ax1.set_ylabel("Episode Return")

257

258

# Normalized scores

259

ax2.boxplot([normalized_results[algo] for algo in algorithms], labels=algorithms)

260

ax2.set_title("Normalized Scores")

261

ax2.set_ylabel("Normalized Performance (0-100)")

262

263

plt.tight_layout()

264

plt.savefig("algorithm_comparison.png", dpi=300, bbox_inches='tight')

265

plt.show()

266

```

267

268

### Training Progress Analysis

269

270

```python

271

import pandas as pd

272

import matplotlib.pyplot as plt

273

import seaborn as sns

274

from pathlib import Path

275

276

def analyze_training_progress(log_dir: str, env_id: str, algo: str):

277

"""

278

Analyze and plot training progress from log files.

279

"""

280

log_path = Path(log_dir) / algo / env_id

281

282

# Load training data (example structure)

283

# In practice, you'd load from actual log files

284

timesteps = np.arange(0, 100000, 1000)

285

episode_rewards = np.random.normal(150, 30, len(timesteps)) + \

286

50 * np.log(timesteps + 1) / np.log(10) # Simulated learning

287

288

# Add noise and occasional drops (realistic training curves)

289

episode_rewards += np.random.normal(0, 10, len(timesteps))

290

291

# Create comprehensive training plot

292

fig, axes = plt.subplots(2, 2, figsize=(15, 10))

293

294

# Learning curve

295

axes[0, 0].plot(timesteps, episode_rewards, alpha=0.7, label='Episode Reward')

296

297

# Add smoothed curve

298

window = 10

299

smoothed = pd.Series(episode_rewards).rolling(window).mean()

300

axes[0, 0].plot(timesteps, smoothed, color='red', linewidth=2, label=f'Smoothed ({window})')

301

302

axes[0, 0].set_xlabel('Timesteps')

303

axes[0, 0].set_ylabel('Episode Reward')

304

axes[0, 0].set_title(f'{algo.upper()} Learning Curve - {env_id}')

305

axes[0, 0].legend()

306

axes[0, 0].grid(True, alpha=0.3)

307

308

# Reward distribution over time

309

# Split into early, middle, late training

310

early = episode_rewards[:len(episode_rewards)//3]

311

middle = episode_rewards[len(episode_rewards)//3:2*len(episode_rewards)//3]

312

late = episode_rewards[2*len(episode_rewards)//3:]

313

314

axes[0, 1].boxplot([early, middle, late], labels=['Early', 'Middle', 'Late'])

315

axes[0, 1].set_title('Reward Distribution by Training Phase')

316

axes[0, 1].set_ylabel('Episode Reward')

317

318

# Improvement rate

319

improvement = np.gradient(smoothed.dropna())

320

axes[1, 0].plot(timesteps[window-1:], improvement, alpha=0.7)

321

axes[1, 0].axhline(y=0, color='r', linestyle='--', alpha=0.5)

322

axes[1, 0].set_xlabel('Timesteps')

323

axes[1, 0].set_ylabel('Improvement Rate')

324

axes[1, 0].set_title('Learning Rate Over Time')

325

axes[1, 0].grid(True, alpha=0.3)

326

327

# Final performance histogram

328

final_episodes = episode_rewards[-20:] # Last 20 episodes

329

axes[1, 1].hist(final_episodes, bins=10, alpha=0.7, edgecolor='black')

330

axes[1, 1].axvline(final_episodes.mean(), color='red', linestyle='--',

331

label=f'Mean: {final_episodes.mean():.1f}')

332

axes[1, 1].set_xlabel('Episode Reward')

333

axes[1, 1].set_ylabel('Frequency')

334

axes[1, 1].set_title('Final Performance Distribution')

335

axes[1, 1].legend()

336

337

plt.tight_layout()

338

plt.savefig(f"{algo}_{env_id}_analysis.png", dpi=300, bbox_inches='tight')

339

plt.show()

340

341

# Use the analysis function

342

analyze_training_progress("./logs", "CartPole-v1", "ppo")

343

```

344

345

### Hyperparameter Sensitivity Analysis

346

347

```python

348

import matplotlib.pyplot as plt

349

import numpy as np

350

from itertools import product

351

352

def plot_hyperparameter_sensitivity():

353

"""

354

Plot how performance varies with different hyperparameters.

355

"""

356

# Example: PPO learning rate vs clip range sensitivity

357

learning_rates = [1e-4, 3e-4, 1e-3, 3e-3]

358

clip_ranges = [0.1, 0.2, 0.3, 0.4]

359

360

# Simulate performance data (replace with actual results)

361

performance_matrix = np.random.normal(180, 20, (len(learning_rates), len(clip_ranges)))

362

363

# Add realistic patterns - lower LR generally more stable

364

for i, lr in enumerate(learning_rates):

365

for j, clip in enumerate(clip_ranges):

366

# Simulate that moderate values work better

367

lr_penalty = abs(np.log10(lr) + 3.5) * 10 # Penalty for extreme LR

368

clip_penalty = abs(clip - 0.2) * 50 # Penalty for extreme clip range

369

performance_matrix[i, j] -= (lr_penalty + clip_penalty)

370

371

# Create heatmap

372

fig, ax = plt.subplots(figsize=(10, 8))

373

374

im = ax.imshow(performance_matrix, cmap='viridis', aspect='auto')

375

376

# Set ticks and labels

377

ax.set_xticks(range(len(clip_ranges)))

378

ax.set_yticks(range(len(learning_rates)))

379

ax.set_xticklabels([f"{cr:.1f}" for cr in clip_ranges])

380

ax.set_yticklabels([f"{lr:.0e}" for lr in learning_rates])

381

382

ax.set_xlabel('Clip Range')

383

ax.set_ylabel('Learning Rate')

384

ax.set_title('PPO Hyperparameter Sensitivity\n(CartPole-v1 Performance)')

385

386

# Add colorbar

387

cbar = plt.colorbar(im, ax=ax)

388

cbar.set_label('Average Episode Reward')

389

390

# Add text annotations

391

for i in range(len(learning_rates)):

392

for j in range(len(clip_ranges)):

393

text = ax.text(j, i, f'{performance_matrix[i, j]:.0f}',

394

ha="center", va="center", color="white", fontweight='bold')

395

396

plt.tight_layout()

397

plt.savefig("hyperparameter_sensitivity.png", dpi=300, bbox_inches='tight')

398

plt.show()

399

400

plot_hyperparameter_sensitivity()

401

```

402

403

## Integration with Command Line Tools

404

405

All plotting functions are available through the RL Zoo3 command line interface:

406

407

```bash

408

# Plot training curves

409

rl_zoo3 plot_train --log-dir ./logs --env CartPole-v1 --algo ppo --smooth 10

410

411

# Plot from evaluation files

412

rl_zoo3 plot_from_file --log-dir ./eval_results --output-dir ./plots

413

414

# Generate all plots

415

rl_zoo3 all_plots --log-dir ./logs --output-dir ./analysis --env CartPole-v1

416

417

# With additional options

418

rl_zoo3 plot_train \

419

--log-dir ./logs \

420

--env CartPole-v1 \

421

--algo ppo \

422

--smooth 10 \

423

--window 50 \

424

--format png \

425

--dpi 300

426

```

427

428

The plotting system integrates seamlessly with the RL Zoo3 training workflow, automatically generating visualizations from standard log formats and providing comprehensive analysis tools for RL experiments.