0
# Environment Suite
1
2
Pre-built collection of continuous control reinforcement learning environments spanning diverse domains including locomotion, manipulation, and classic control problems. The suite provides standardized interfaces, consistent action/observation spaces, and benchmark task definitions for RL research.
3
4
## Capabilities
5
6
### Environment Loading
7
8
Load environments by domain and task name with optional configuration parameters.
9
10
```python { .api }
11
def load(domain_name: str, task_name: str, task_kwargs=None, environment_kwargs=None, visualize_reward=False):
12
"""
13
Returns an environment from a domain name, task name and optional settings.
14
15
Parameters:
16
- domain_name: String name of the domain (e.g., 'cartpole', 'walker')
17
- task_name: String name of the task (e.g., 'balance', 'walk')
18
- task_kwargs: Optional dict of keyword arguments for the task
19
- environment_kwargs: Optional dict of keyword arguments for the environment
20
- visualize_reward: Optional bool to enable reward visualization in rendering
21
22
Returns:
23
Environment instance ready for interaction
24
25
Example:
26
>>> env = suite.load('cartpole', 'balance')
27
>>> env = suite.load('walker', 'walk', task_kwargs={'random': 42})
28
"""
29
30
def build_environment(domain_name: str, task_name: str, task_kwargs=None, environment_kwargs=None, visualize_reward=False):
31
"""
32
Returns an environment from the suite with comprehensive error handling.
33
34
Parameters: Same as load()
35
36
Raises:
37
- ValueError: If domain or task doesn't exist
38
39
Returns:
40
Environment instance
41
42
Note: Identical functionality to load() but with explicit error handling
43
"""
44
```
45
46
### Environment Collections
47
48
Pre-defined collections of environments organized by difficulty and purpose.
49
50
```python { .api }
51
# Complete environment catalog
52
ALL_TASKS: tuple
53
"""Tuple containing all available (domain_name, task_name) pairs"""
54
55
# Difficulty-based collections
56
BENCHMARKING: tuple
57
"""Tuple of (domain, task) pairs used for benchmarking"""
58
59
EASY: tuple
60
"""Tuple of easier difficulty tasks suitable for initial testing"""
61
62
HARD: tuple
63
"""Tuple of challenging tasks for advanced evaluation"""
64
65
EXTRA: tuple
66
"""Tuple of additional tasks not included in benchmarking set"""
67
68
# Visualization-based collections
69
REWARD_VIZ: tuple
70
"""Tuple of tasks that support reward visualization"""
71
72
NO_REWARD_VIZ: tuple
73
"""Tuple of tasks without reward visualization support"""
74
75
# Domain organization
76
TASKS_BY_DOMAIN: dict
77
"""Dict mapping domain names to tuples of their task names"""
78
```
79
80
### Available Domains
81
82
The suite includes environments across these domains:
83
84
```python { .api }
85
# Locomotion domains
86
acrobot # Acrobat balancing task
87
cheetah # Cheetah running tasks
88
hopper # Single-leg hopping tasks
89
humanoid # Humanoid locomotion tasks
90
humanoid_CMU # CMU humanoid with mocap data
91
quadruped # Four-legged locomotion
92
swimmer # Swimming locomotion
93
walker # Bipedal walking tasks
94
dog # Dog locomotion tasks
95
96
# Manipulation domains
97
finger # Finger manipulation tasks
98
manipulator # Robotic arm manipulation
99
reacher # Point reaching tasks
100
stacker # Block stacking tasks
101
102
# Classic control domains
103
ball_in_cup # Ball-in-cup balancing
104
cartpole # Cartpole balancing
105
pendulum # Pendulum swing-up
106
point_mass # Point mass navigation
107
108
# Control theory domains
109
lqr # Linear quadratic regulator
110
111
# Aquatic domains
112
fish # Fish swimming tasks
113
```
114
115
## Usage Examples
116
117
### Basic Environment Usage
118
119
```python
120
from dm_control import suite
121
122
# Load environment
123
env = suite.load('cartpole', 'balance')
124
125
# Environment interaction loop
126
time_step = env.reset()
127
while not time_step.last():
128
action = env.action_spec().generate_value() # Random action
129
time_step = env.step(action)
130
131
print(f"Reward: {time_step.reward}")
132
print(f"Observation: {time_step.observation}")
133
```
134
135
### Environment Exploration
136
137
```python
138
# Explore available environments
139
print("All available tasks:")
140
for domain, task in suite.ALL_TASKS:
141
print(f" {domain}/{task}")
142
143
print(f"\nBenchmarking tasks: {len(suite.BENCHMARKING)}")
144
print(f"Easy tasks: {len(suite.EASY)}")
145
print(f"Hard tasks: {len(suite.HARD)}")
146
147
# Explore domain-specific tasks
148
print("\nTasks by domain:")
149
for domain, tasks in suite.TASKS_BY_DOMAIN.items():
150
print(f" {domain}: {tasks}")
151
```
152
153
### Custom Configuration
154
155
```python
156
# Load with custom task parameters
157
env = suite.load(
158
'walker', 'walk',
159
task_kwargs={'random': 42}, # Set random seed
160
environment_kwargs={'flat_observation': True} # Flatten observations
161
)
162
163
# Enable reward visualization
164
env = suite.load('reacher', 'easy', visualize_reward=True)
165
```
166
167
### Environment Properties
168
169
```python
170
env = suite.load('humanoid', 'stand')
171
172
# Inspect environment specifications
173
print(f"Action spec: {env.action_spec()}")
174
print(f"Observation spec: {env.observation_spec()}")
175
print(f"Reward range: {env.reward_range()}")
176
177
# Access physics simulation
178
physics = env.physics
179
print(f"Timestep: {physics.timestep()}")
180
print(f"Control: {physics.control}")
181
```
182
183
## Error Handling
184
185
```python
186
try:
187
env = suite.load('nonexistent_domain', 'task')
188
except ValueError as e:
189
print(f"Domain error: {e}")
190
191
try:
192
env = suite.load('cartpole', 'nonexistent_task')
193
except ValueError as e:
194
print(f"Task error: {e}")
195
```