0
# MMEngine
1
2
A foundational library for training deep learning models based on PyTorch that serves as the training engine for all OpenMMLab codebases. MMEngine provides a comprehensive training framework with integrated large-scale model training support, user-friendly configuration management, and extensive monitoring capabilities across mainstream platforms.
3
4
## Package Information
5
6
- **Package Name**: mmengine
7
- **Language**: Python
8
- **Installation**: `pip install mmengine`
9
10
## Core Imports
11
12
```python
13
import mmengine
14
```
15
16
Common module imports:
17
18
```python
19
from mmengine import Config, ConfigDict
20
from mmengine import Runner, BaseLoop
21
from mmengine import Registry, MODELS, DATASETS
22
from mmengine import MMLogger, print_log
23
from mmengine import fileio
24
```
25
26
## Basic Usage
27
28
```python
29
import mmengine
30
from mmengine import Config, Runner
31
32
# Load configuration
33
cfg = Config.fromfile('config.py')
34
35
# Create and run training
36
runner = Runner.from_cfg(cfg)
37
runner.train()
38
39
# Basic file operations
40
from mmengine import fileio
41
data = fileio.load('data.json')
42
fileio.dump(data, 'output.json')
43
44
# Logging
45
from mmengine import MMLogger
46
logger = MMLogger.get_instance('my_logger')
47
logger.info('Training started')
48
```
49
50
## Architecture
51
52
MMEngine's modular architecture consists of several key components:
53
54
- **Runner**: Central coordinator managing the entire training process
55
- **Loops**: Training, validation, and test execution loops with flexible strategies
56
- **Registry System**: Component discovery and instantiation framework
57
- **Config System**: Hierarchical configuration management with inheritance
58
- **FileIO**: Multi-backend file operations (local, cloud, distributed storage)
59
- **Hooks**: Event-driven system for customizing training behaviors
60
- **Optimization**: Advanced optimizers, schedulers, and mixed precision training
61
- **Distributed Training**: Multi-GPU and multi-node training strategies
62
63
This design enables scalable, configurable, and extensible training pipelines that can be easily adapted for different deep learning tasks while maintaining consistency across the OpenMMLab ecosystem.
64
65
## Capabilities
66
67
### Configuration Management
68
69
Advanced configuration system supporting Python-style and plain-text configuration files with inheritance, variable interpolation, and runtime modification capabilities.
70
71
```python { .api }
72
class Config:
73
def __init__(self, cfg_dict: dict = None, cfg_text: str = None, filename: str = None): ...
74
@staticmethod
75
def fromfile(filename: str, use_predefined_variables: bool = True, import_custom_modules: bool = True) -> 'Config': ...
76
def merge_from_dict(self, options: dict, allow_list_keys: bool = True): ...
77
def dump(self, file: str = None) -> str: ...
78
79
class ConfigDict(dict):
80
def __init__(self, *args, **kwargs): ...
81
def __getattr__(self, name: str): ...
82
def __setattr__(self, name: str, value): ...
83
```
84
85
[Configuration Management](./configuration.md)
86
87
### File I/O and Storage Backends
88
89
Unified file operations supporting multiple storage backends including local filesystem, HTTP, Petrel, LMDB, and Memcached with transparent backend switching and format-specific handlers.
90
91
```python { .api }
92
class FileClient:
93
def __init__(self, backend: str = 'disk', **kwargs): ...
94
def get(self, filepath: str) -> bytes: ...
95
def put(self, obj: bytes, filepath: str): ...
96
def exists(self, filepath: str) -> bool: ...
97
98
def load(file: str, file_format: str = None, **kwargs): ...
99
def dump(obj, file: str = None, file_format: str = None, **kwargs): ...
100
def exists(filepath: str, backend: str = 'disk') -> bool: ...
101
def isdir(filepath: str, backend: str = 'disk') -> bool: ...
102
def isfile(filepath: str, backend: str = 'disk') -> bool: ...
103
```
104
105
[File I/O and Storage](./fileio.md)
106
107
### Training Runners and Loops
108
109
Complete training orchestration system with flexible runners supporting epoch-based and iteration-based training, validation, and testing loops with built-in checkpointing and logging.
110
111
```python { .api }
112
class Runner:
113
def __init__(self, model, work_dir: str = None, train_dataloader = None, val_dataloader = None, test_dataloader = None, train_cfg: dict = None, val_cfg: dict = None, test_cfg: dict = None, auto_scale_lr: dict = None, optim_wrapper = None, param_scheduler = None, val_evaluator = None, test_evaluator = None, default_hooks: dict = None, custom_hooks: list = None, data_preprocessor = None, load_from: str = None, resume: bool = False, launcher: str = 'none', env_cfg: dict = None, log_processor = None, visualizer = None, default_scope: str = 'mmengine', randomness: dict = None, experiment_name: str = None, cfg: dict = None): ...
114
@classmethod
115
def from_cfg(cls, cfg) -> 'Runner': ...
116
def train(self): ...
117
def val(self): ...
118
def test(self): ...
119
120
class BaseLoop:
121
def __init__(self, runner, dataloader): ...
122
def run(self): ...
123
124
class EpochBasedTrainLoop(BaseLoop):
125
def __init__(self, runner, dataloader, max_epochs: int, val_begin: int = 1, val_interval: int = 1, dynamic_intervals: list = None): ...
126
127
class IterBasedTrainLoop(BaseLoop):
128
def __init__(self, runner, dataloader, max_iters: int, val_begin: int = 1, val_interval: int = 1, dynamic_intervals: list = None): ...
129
```
130
131
[Training and Loops](./training.md)
132
133
### Registry System
134
135
Comprehensive component registry system enabling modular architecture with automatic discovery, registration, and instantiation of models, datasets, optimizers, and other components.
136
137
```python { .api }
138
class Registry:
139
def __init__(self, name: str, build_func: callable = None, parent: 'Registry' = None, scope: str = None, locations: list = None): ...
140
def register_module(self, name: str = None, force: bool = False, module: type = None) -> callable: ...
141
def build(self, cfg: dict) -> object: ...
142
def get(self, key: str) -> type: ...
143
144
def build_from_cfg(cfg: dict, registry: Registry, default_args: dict = None) -> object: ...
145
146
# Global registries
147
MODELS: Registry
148
DATASETS: Registry
149
TRANSFORMS: Registry
150
OPTIMIZERS: Registry
151
RUNNERS: Registry
152
HOOKS: Registry
153
METRICS: Registry
154
```
155
156
[Registry System](./registry.md)
157
158
### Logging and Monitoring
159
160
Advanced logging system with support for multiple visualization backends including TensorBoard, Weights & Biases, MLflow, ClearML, Neptune, and others with structured message passing and history tracking.
161
162
```python { .api }
163
class MMLogger:
164
@staticmethod
165
def get_instance(name: str, logger_name: str = None) -> 'MMLogger': ...
166
def info(self, message: str): ...
167
def warning(self, message: str): ...
168
def error(self, message: str): ...
169
def debug(self, message: str): ...
170
171
class MessageHub:
172
@classmethod
173
def get_instance(cls, name: str = 'mmengine') -> 'MessageHub': ...
174
def update_scalar(self, key: str, value: float, count: int = 1): ...
175
def update_scalars(self, scalar_dict: dict, count: int = 1): ...
176
def get_scalar(self, key: str): ...
177
178
def print_log(msg: str, logger: str = None, level: int = 20): ...
179
```
180
181
[Logging and Monitoring](./logging.md)
182
183
### Optimization and Scheduling
184
185
Comprehensive optimization framework with support for multiple optimizers, learning rate schedulers, momentum schedulers, automatic mixed precision, and gradient accumulation strategies.
186
187
```python { .api }
188
class OptimWrapper:
189
def __init__(self, optimizer, accumulative_counts: int = 1, clip_grad: dict = None): ...
190
def update_params(self, loss): ...
191
def zero_grad(self): ...
192
def step(self): ...
193
def backward(self, loss): ...
194
195
class AmpOptimWrapper(OptimWrapper):
196
def __init__(self, loss_scale: str = 'dynamic', **kwargs): ...
197
198
# Learning rate schedulers
199
class CosineAnnealingLR:
200
def __init__(self, T_max: int, eta_min: float = 0, **kwargs): ...
201
202
class MultiStepLR:
203
def __init__(self, milestones: list, gamma: float = 0.1, **kwargs): ...
204
205
class OneCycleLR:
206
def __init__(self, max_lr: float, total_steps: int = None, epochs: int = None, steps_per_epoch: int = None, **kwargs): ...
207
```
208
209
[Optimization and Scheduling](./optimization.md)
210
211
### Distributed Training
212
213
Multi-GPU and multi-node training support with various distribution strategies including DDP, FSDP, DeepSpeed, and ColossalAI integration with communication utilities and device management.
214
215
```python { .api }
216
def init_dist(launcher: str, backend: str = 'nccl', **kwargs): ...
217
def get_dist_info() -> tuple: ...
218
def get_rank() -> int: ...
219
def get_world_size() -> int: ...
220
def is_main_process() -> bool: ...
221
def barrier(): ...
222
223
def all_reduce(tensor, op: str = 'sum'): ...
224
def all_gather(tensor_list: list, tensor): ...
225
def broadcast(tensor, src: int = 0): ...
226
def collect_results(result_part: list, size: int, tmpdir: str = None) -> list: ...
227
228
class MMDistributedDataParallel:
229
def __init__(self, module, device_ids: list = None, output_device: int = None, broadcast_buffers: bool = True, find_unused_parameters: bool = False): ...
230
```
231
232
[Distributed Training](./distributed.md)
233
234
### Model Management and Hooks
235
236
Comprehensive model management system with base classes, weight initialization, model wrappers for distributed training, and extensive hook system for customizing training behaviors.
237
238
```python { .api }
239
class BaseModel:
240
def __init__(self, init_cfg: dict = None, data_preprocessor: dict = None): ...
241
def forward(self, *args, **kwargs): ...
242
def train_step(self, data, optim_wrapper): ...
243
def val_step(self, data): ...
244
def test_step(self, data): ...
245
246
class Hook:
247
def before_run(self, runner): ...
248
def after_run(self, runner): ...
249
def before_train(self, runner): ...
250
def after_train(self, runner): ...
251
def before_train_epoch(self, runner): ...
252
def after_train_epoch(self, runner): ...
253
def before_train_iter(self, runner): ...
254
def after_train_iter(self, runner): ...
255
256
class CheckpointHook(Hook):
257
def __init__(self, interval: int = -1, by_epoch: bool = True, save_optimizer: bool = True, save_param_scheduler: bool = True, out_dir: str = None, max_keep_ckpts: int = -1, save_last: bool = True, save_best: str = 'auto', rule: str = 'greater'): ...
258
```
259
260
[Models and Hooks](./models.md)
261
262
### Dataset and Data Processing
263
264
Dataset abstraction layer with support for various dataset types, data transformations, sampling strategies, and data loading utilities optimized for distributed training.
265
266
```python { .api }
267
class BaseDataset:
268
def __init__(self, ann_file: str = '', metainfo: dict = None, data_root: str = '', data_prefix: dict = None, filter_cfg: dict = None, indices: int = None, serialize_data: bool = True, pipeline: list = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000): ...
269
def __len__(self) -> int: ...
270
def __getitem__(self, idx: int): ...
271
def get_data_info(self, idx: int) -> dict: ...
272
273
class Compose:
274
def __init__(self, transforms: list): ...
275
def __call__(self, data: dict) -> dict: ...
276
277
class DefaultSampler:
278
def __init__(self, dataset, shuffle: bool = True, seed: int = None, round_up: bool = True): ...
279
```
280
281
[Dataset and Data Processing](./dataset.md)
282
283
### Visualization and Analysis
284
285
Visualization framework supporting multiple backends and model analysis tools for computing FLOPs, activation memory, and parameter counts with comprehensive reporting capabilities.
286
287
```python { .api }
288
class Visualizer:
289
def __init__(self, name: str = 'visualizer', image: np.ndarray = None, vis_backends: list = None, save_dir: str = None, bbox_color: str = 'green', text_color: str = 'green', mask_color: str = 'green', line_width: int = 3, alpha: float = 0.8): ...
290
def add_datasample(self, name: str, image: np.ndarray, data_sample = None, draw_gt: bool = True, draw_pred: bool = True, show: bool = False, wait_time: float = 0, out_file: str = None, pred_score_thr: float = 0.3, step: int = 0): ...
291
292
def get_model_complexity_info(model, input_shape: tuple, print_per_layer_stat: bool = True, as_strings: bool = True, input_constructor: callable = None, ost: any = None, verbose: bool = True, ignore_modules: list = None, custom_modules_hooks: dict = None): ...
293
294
def parameter_count(model) -> int: ...
295
def flop_count(model, inputs: tuple, supported_ops: dict = None) -> int: ...
296
```
297
298
[Visualization and Analysis](./visualization.md)
299
300
## Types
301
302
```python { .api }
303
# Configuration types
304
ConfigType = Union[str, dict, Config, ConfigDict]
305
306
# Registry types
307
BuildFunc = Callable[[dict], Any]
308
ScopeType = Optional[str]
309
310
# Runner types
311
DataLoader = Any # torch.utils.data.DataLoader
312
OptimizerType = Any # torch.optim.Optimizer
313
SchedulerType = Any # torch.optim.lr_scheduler._LRScheduler
314
ModelType = Any # torch.nn.Module
315
316
# Hook types
317
HookType = Union[Hook, dict]
318
Priority = Union[int, str]
319
320
# Device types
321
DeviceType = Union[str, int, torch.device]
322
323
# File I/O types
324
Backend = Union[str, BaseStorageBackend]
325
FileFormat = Optional[str]
326
```