or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

accelerators.mdcallbacks.mdcore-training.mddata.mdfabric.mdindex.mdloggers.mdprecision.mdprofilers.mdstrategies.md

index.mddocs/

0

# Lightning

1

2

The Deep Learning framework to train, deploy, and ship AI products Lightning fast. Lightning provides a unified interface combining PyTorch Lightning (for high-level model training) with Lightning Fabric (for expert-level control) and data utilities, enabling researchers and practitioners to build production-ready deep learning applications at scale.

3

4

## Package Information

5

6

- **Package Name**: lightning

7

- **Package Type**: pypi

8

- **Language**: Python

9

- **Installation**: `pip install lightning`

10

11

## Core Imports

12

13

```python

14

import lightning as L

15

```

16

17

Main framework components:

18

19

```python

20

from lightning import Trainer, LightningModule, LightningDataModule, Callback

21

```

22

23

Lightweight acceleration:

24

25

```python

26

from lightning import Fabric

27

```

28

29

Utilities:

30

31

```python

32

from lightning import seed_everything

33

from lightning.pytorch.utilities.warnings import disable_possible_user_warnings

34

```

35

36

## Basic Usage

37

38

```python

39

import lightning as L

40

import torch

41

import torch.nn as nn

42

import torch.nn.functional as F

43

from torch.utils.data import DataLoader, random_split

44

from torchvision import transforms

45

from torchvision.datasets import MNIST

46

47

# Define a Lightning Module

48

class LitModel(L.LightningModule):

49

def __init__(self):

50

super().__init__()

51

self.layer_1 = nn.Linear(28 * 28, 128)

52

self.layer_2 = nn.Linear(128, 10)

53

54

def forward(self, x):

55

x = x.view(x.size(0), -1)

56

x = torch.relu(self.layer_1(x))

57

x = self.layer_2(x)

58

return x

59

60

def training_step(self, batch, batch_idx):

61

x, y = batch

62

y_hat = self(x)

63

loss = F.cross_entropy(y_hat, y)

64

return loss

65

66

def configure_optimizers(self):

67

return torch.optim.Adam(self.parameters())

68

69

# Define a Data Module

70

class MNISTDataModule(L.LightningDataModule):

71

def __init__(self, data_dir: str = './'):

72

super().__init__()

73

self.data_dir = data_dir

74

self.transform = transforms.Compose([

75

transforms.ToTensor(),

76

transforms.Normalize((0.1307,), (0.3081,))

77

])

78

79

def prepare_data(self):

80

MNIST(self.data_dir, train=True, download=True)

81

MNIST(self.data_dir, train=False, download=True)

82

83

def setup(self, stage: str):

84

if stage == "fit":

85

mnist_full = MNIST(self.data_dir, train=True, transform=self.transform)

86

self.mnist_train, self.mnist_val = random_split(mnist_full, [55000, 5000])

87

if stage == "test":

88

self.mnist_test = MNIST(self.data_dir, train=False, transform=self.transform)

89

90

def train_dataloader(self):

91

return DataLoader(self.mnist_train, batch_size=32)

92

93

def val_dataloader(self):

94

return DataLoader(self.mnist_val, batch_size=32)

95

96

def test_dataloader(self):

97

return DataLoader(self.mnist_test, batch_size=32)

98

99

# Train the model

100

if __name__ == "__main__":

101

model = LitModel()

102

datamodule = MNISTDataModule()

103

trainer = L.Trainer(max_epochs=10)

104

trainer.fit(model, datamodule)

105

```

106

107

## Architecture

108

109

Lightning provides a layered architecture designed for maximum flexibility and production readiness:

110

111

- **Lightning Fabric**: Low-level acceleration layer providing expert control over training loops, device management, and distributed strategies

112

- **PyTorch Lightning**: High-level framework built on Fabric, offering structured training workflows with automatic optimization, logging, and checkpointing

113

- **Unified Interface**: Single package combining both approaches, allowing users to choose the right abstraction level

114

- **Data Integration**: Built-in streaming data capabilities through litdata integration

115

- **Production Features**: Multi-GPU/multi-node training, cloud deployment, extensive logging, and MLOps integrations

116

117

This design enables seamless transitions from research prototyping to production deployment while maintaining code reusability and scalability.

118

119

## Capabilities

120

121

### Core Training Components

122

123

Essential components for structuring deep learning training: the Trainer orchestrator, LightningModule for model definition, LightningDataModule for data handling, and Callback system for training lifecycle hooks.

124

125

```python { .api }

126

class Trainer:

127

def __init__(self, **kwargs): ...

128

def fit(self, model, datamodule=None, train_dataloaders=None, val_dataloaders=None, **kwargs): ...

129

def test(self, model=None, dataloaders=None, **kwargs): ...

130

def predict(self, model=None, dataloaders=None, **kwargs): ...

131

132

class LightningModule:

133

def __init__(self): ...

134

def forward(self, *args, **kwargs): ...

135

def training_step(self, batch, batch_idx): ...

136

def validation_step(self, batch, batch_idx): ...

137

def test_step(self, batch, batch_idx): ...

138

def configure_optimizers(self): ...

139

140

class LightningDataModule:

141

def __init__(self): ...

142

def prepare_data(self): ...

143

def setup(self, stage: str): ...

144

def train_dataloader(self): ...

145

def val_dataloader(self): ...

146

def test_dataloader(self): ...

147

148

class Callback:

149

def on_train_start(self, trainer, pl_module): ...

150

def on_train_end(self, trainer, pl_module): ...

151

def on_epoch_start(self, trainer, pl_module): ...

152

def on_epoch_end(self, trainer, pl_module): ...

153

```

154

155

[Core Training Components](./core-training.md)

156

157

### Lightning Fabric

158

159

Lightweight training acceleration framework providing expert-level control over training loops, device management, and distributed strategies without high-level abstractions.

160

161

```python { .api }

162

class Fabric:

163

def __init__(self, **kwargs): ...

164

def setup(self, model, *optimizers): ...

165

def setup_dataloaders(self, *dataloaders): ...

166

def backward(self, tensor): ...

167

def all_gather(self, tensor): ...

168

def broadcast(self, tensor): ...

169

170

def seed_everything(seed: int): ...

171

def is_wrapped(obj): ...

172

```

173

174

[Lightning Fabric](./fabric.md)

175

176

### Callbacks and Lifecycle Hooks

177

178

Comprehensive callback system for training lifecycle management including checkpointing, early stopping, learning rate scheduling, monitoring, and optimization callbacks.

179

180

```python { .api }

181

class ModelCheckpoint(Callback):

182

def __init__(self, dirpath=None, filename=None, monitor=None, **kwargs): ...

183

184

class EarlyStopping(Callback):

185

def __init__(self, monitor, patience=3, **kwargs): ...

186

187

class LearningRateMonitor(Callback):

188

def __init__(self, logging_interval='epoch'): ...

189

190

class StochasticWeightAveraging(Callback):

191

def __init__(self, swa_lrs=None, **kwargs): ...

192

```

193

194

[Callbacks and Lifecycle Hooks](./callbacks.md)

195

196

### Distributed Training Strategies

197

198

Multiple strategies for distributed and parallel training including data parallel, distributed data parallel, fully sharded data parallel, model parallel, and specialized strategies for different hardware.

199

200

```python { .api }

201

class DDPStrategy:

202

def __init__(self, **kwargs): ...

203

204

class FSDPStrategy:

205

def __init__(self, **kwargs): ...

206

207

class DeepSpeedStrategy:

208

def __init__(self, **kwargs): ...

209

210

class DataParallelStrategy:

211

def __init__(self): ...

212

```

213

214

[Distributed Training Strategies](./strategies.md)

215

216

### Hardware Acceleration

217

218

Support for various hardware accelerators including CPU, CUDA GPUs, Apple Metal Performance Shaders, and Google TPUs with automatic device detection and optimization.

219

220

```python { .api }

221

class CPUAccelerator:

222

def setup_device(self, device): ...

223

224

class CUDAAccelerator:

225

def setup_device(self, device): ...

226

227

class MPSAccelerator:

228

def setup_device(self, device): ...

229

230

class XLAAccelerator:

231

def setup_device(self, device): ...

232

233

def find_usable_cuda_devices(num_gpus: int = -1): ...

234

```

235

236

[Hardware Acceleration](./accelerators.md)

237

238

### Precision Control and Optimization

239

240

Precision plugins for mixed precision training, quantization, and various floating-point formats to optimize memory usage and training speed while maintaining model quality.

241

242

```python { .api }

243

class MixedPrecision:

244

def __init__(self, precision='16-mixed', **kwargs): ...

245

246

class HalfPrecision:

247

def __init__(self): ...

248

249

class DoublePrecision:

250

def __init__(self): ...

251

252

class BitsandbytesPrecision:

253

def __init__(self, mode='int8', **kwargs): ...

254

```

255

256

[Precision Control](./precision.md)

257

258

### Logging and Monitoring

259

260

Integration with popular experiment tracking platforms and comprehensive logging capabilities for monitoring training progress, metrics, hyperparameters, and model artifacts.

261

262

```python { .api }

263

class TensorBoardLogger:

264

def __init__(self, save_dir, **kwargs): ...

265

266

class WandbLogger:

267

def __init__(self, project=None, **kwargs): ...

268

269

class MLFlowLogger:

270

def __init__(self, experiment_name=None, **kwargs): ...

271

272

class CSVLogger:

273

def __init__(self, save_dir, **kwargs): ...

274

```

275

276

[Logging and Monitoring](./loggers.md)

277

278

### Profiling and Performance Analysis

279

280

Profiling tools for analyzing training performance, identifying bottlenecks, and optimizing model training efficiency across different hardware configurations.

281

282

```python { .api }

283

class PyTorchProfiler:

284

def __init__(self, **kwargs): ...

285

286

class AdvancedProfiler:

287

def __init__(self, **kwargs): ...

288

289

class SimpleProfiler:

290

def __init__(self): ...

291

```

292

293

[Profiling and Performance](./profilers.md)

294

295

### Data Utilities

296

297

Data handling utilities including streaming datasets, combined data loaders, and data processing functions for efficient data pipeline management in large-scale training.

298

299

```python { .api }

300

class StreamingDataset:

301

def __init__(self, **kwargs): ...

302

303

class CombinedStreamingDataset:

304

def __init__(self, datasets, **kwargs): ...

305

306

def optimize(data_dir, **kwargs): ...

307

def map(function, inputs, **kwargs): ...

308

```

309

310

[Data Utilities](./data.md)

311

312

## Utilities

313

314

Common utilities for training control and configuration.

315

316

```python { .api }

317

def seed_everything(seed: int, workers: bool = False) -> int: ...

318

def disable_possible_user_warnings() -> None: ...

319

```

320

321

## Types

322

323

```python { .api }

324

from typing import Any, Dict, List, Optional, Union

325

from torch import Tensor

326

from torch.nn import Module

327

from torch.optim import Optimizer

328

from torch.utils.data import DataLoader

329

330

# Core types

331

STEP_OUTPUT = Union[Tensor, Dict[str, Any]]

332

TRAIN_DATALOADERS = Union[DataLoader, List[DataLoader], Dict[str, DataLoader]]

333

EVAL_DATALOADERS = Union[DataLoader, List[DataLoader]]

334

_EVALUATE_OUTPUT = List[Dict[str, float]]

335

_PREDICT_OUTPUT = List[Any]

336

337

# LR Scheduler configuration

338

class LRSchedulerConfig:

339

scheduler: Any

340

interval: str = "epoch"

341

frequency: int = 1

342

monitor: Optional[str] = None

343

strict: bool = True

344

name: Optional[str] = None

345

346

# Enums

347

class GradClipAlgorithmType:

348

NORM = "norm"

349

VALUE = "value"

350

351

class LightningEnum:

352

pass

353

354

# Constants

355

FLOAT16_EPSILON: float

356

FLOAT32_EPSILON: float

357

FLOAT64_EPSILON: float

358

```