or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

datasets.mdindex.mdio.mdmodels.mdops.mdtransforms.mdtv_tensors.mdutils.md

index.mddocs/

0

# TorchVision

1

2

TorchVision is a computer vision library for PyTorch that provides datasets, model architectures, and computer vision transforms. It offers a comprehensive toolkit for building computer vision applications with pre-trained models, data loading utilities, and image/video processing capabilities.

3

4

## Package Information

5

6

- **Package Name**: torchvision

7

- **Language**: Python

8

- **Installation**: `pip install torchvision`

9

- **Version**: 0.23.0

10

11

## Core Imports

12

13

```python

14

import torchvision

15

from torchvision import datasets, models, transforms, utils, io, ops, tv_tensors

16

```

17

18

Common patterns:

19

20

```python

21

import torchvision.transforms as transforms

22

import torchvision.models as models

23

from torchvision.datasets import CIFAR10, ImageNet

24

```

25

26

## Basic Usage

27

28

```python

29

import torch

30

import torchvision.transforms as transforms

31

from torchvision import models, datasets

32

from torch.utils.data import DataLoader

33

34

# Load a pre-trained model

35

model = models.resnet50(weights='DEFAULT')

36

model.eval()

37

38

# Create transform pipeline

39

transform = transforms.Compose([

40

transforms.Resize(256),

41

transforms.CenterCrop(224),

42

transforms.ToTensor(),

43

transforms.Normalize(mean=[0.485, 0.456, 0.406],

44

std=[0.229, 0.224, 0.225])

45

])

46

47

# Load dataset

48

dataset = datasets.CIFAR10(root='./data', train=False,

49

download=True, transform=transform)

50

dataloader = DataLoader(dataset, batch_size=32, shuffle=False)

51

52

# Inference

53

with torch.no_grad():

54

for images, labels in dataloader:

55

outputs = model(images)

56

predictions = torch.argmax(outputs, dim=1)

57

break

58

```

59

60

## Architecture

61

62

TorchVision is organized into several key modules:

63

64

- **Models**: Pre-trained neural networks for classification, detection, segmentation, and video tasks

65

- **Datasets**: Standard computer vision datasets with automatic download and preprocessing

66

- **Transforms**: Image and video processing operations with v1/v2 APIs for different data types

67

- **Utils**: Visualization utilities and tensor operations

68

- **I/O**: Image and video reading/writing operations with format support

69

- **Ops**: Low-level operations for object detection, segmentation, and custom layers

70

- **TV Tensors**: Enhanced tensor types that preserve metadata through transformations

71

72

## Capabilities

73

74

### Global Configuration

75

76

Core TorchVision configuration functions for backend management.

77

78

```python { .api }

79

def set_image_backend(backend: str) -> None:

80

"""Set the image loading backend ('PIL' or 'accimage')."""

81

82

def get_image_backend() -> str:

83

"""Get the current image backend."""

84

85

def set_video_backend(backend: str) -> None:

86

"""Set the video decoding backend ('pyav', 'video_reader', or 'cuda')."""

87

88

def get_video_backend() -> str:

89

"""Get the current video backend."""

90

91

def disable_beta_transforms_warning() -> None:

92

"""Disable beta transforms warning (legacy compatibility function)."""

93

```

94

95

### Datasets

96

97

Comprehensive collection of computer vision datasets with automatic downloading and preprocessing capabilities. Includes image classification, object detection, segmentation, and video datasets.

98

99

```python { .api }

100

class VisionDataset:

101

"""Base class for all vision datasets."""

102

103

class ImageFolder(VisionDataset):

104

"""Data loader for image classification datasets in folder format."""

105

106

class CIFAR10(VisionDataset):

107

"""CIFAR-10 dataset."""

108

109

class ImageNet(VisionDataset):

110

"""ImageNet dataset."""

111

112

class CocoDetection(VisionDataset):

113

"""COCO dataset for object detection."""

114

```

115

116

[Datasets](./datasets.md)

117

118

### Models

119

120

Pre-trained neural network models for various computer vision tasks including classification, object detection, instance segmentation, semantic segmentation, and video understanding.

121

122

```python { .api }

123

def get_model(name: str, **config) -> torch.nn.Module:

124

"""Get model by name with configuration."""

125

126

def list_models() -> list[str]:

127

"""List all available models."""

128

129

def resnet50(weights=None, progress: bool = True, **kwargs) -> torch.nn.Module:

130

"""ResNet-50 model."""

131

132

def fasterrcnn_resnet50_fpn(weights=None, progress: bool = True, **kwargs) -> torch.nn.Module:

133

"""Faster R-CNN with ResNet-50-FPN backbone."""

134

```

135

136

[Models](./models.md)

137

138

### Transforms

139

140

Image and video preprocessing and augmentation operations. Includes both v1 (PIL/tensor) and v2 (multi-tensor) APIs for different data types.

141

142

```python { .api }

143

class Compose:

144

"""Composes several transforms together."""

145

146

class Resize:

147

"""Resize image to given size."""

148

149

class ToTensor:

150

"""Convert PIL Image or numpy array to tensor."""

151

152

class Normalize:

153

"""Normalize tensor with mean and std."""

154

155

class RandomHorizontalFlip:

156

"""Randomly flip image horizontally."""

157

```

158

159

[Transforms](./transforms.md)

160

161

### Utils

162

163

Visualization utilities and tensor operations for working with images, bounding boxes, masks, and keypoints.

164

165

```python { .api }

166

def make_grid(tensor, nrow: int = 8, padding: int = 2, normalize: bool = False):

167

"""Make a grid of images."""

168

169

def save_image(tensor, fp, nrow: int = 8, padding: int = 2, normalize: bool = False):

170

"""Save tensor as image file."""

171

172

def draw_bounding_boxes(image, boxes, labels=None, colors=None, fill: bool = False, width: int = 1):

173

"""Draw bounding boxes on image."""

174

```

175

176

[Utils](./utils.md)

177

178

### I/O Operations

179

180

Image and video input/output operations with support for multiple formats and backends.

181

182

```python { .api }

183

def read_image(path: str, mode: str = 'RGB'):

184

"""Read image file to tensor."""

185

186

def write_jpeg(input, filename: str, quality: int = 75):

187

"""Write tensor as JPEG file."""

188

189

def read_video(filename: str, start_pts: float = 0, end_pts=None, pts_unit: str = 'pts'):

190

"""Read video file."""

191

192

class VideoReader:

193

"""Video reader for streaming video data."""

194

```

195

196

[I/O Operations](./io.md)

197

198

### Operations

199

200

Low-level operations for object detection, segmentation, and specialized neural network layers.

201

202

```python { .api }

203

def nms(boxes, scores, iou_threshold: float):

204

"""Non-maximum suppression."""

205

206

def roi_align(input, boxes, output_size, spatial_scale: float = 1.0, sampling_ratio: int = -1, aligned: bool = False):

207

"""RoI Align operation."""

208

209

def box_iou(boxes1, boxes2):

210

"""Calculate IoU between box sets."""

211

212

class FeaturePyramidNetwork(torch.nn.Module):

213

"""Feature Pyramid Network."""

214

```

215

216

[Operations](./ops.md)

217

218

### TV Tensors

219

220

Enhanced tensor types that preserve metadata and semantics through transformations, supporting images, videos, bounding boxes, masks, and keypoints.

221

222

```python { .api }

223

class Image(torch.Tensor):

224

"""Image tensor type with metadata."""

225

226

class BoundingBoxes(torch.Tensor):

227

"""Bounding box tensor with format and canvas size."""

228

229

class Mask(torch.Tensor):

230

"""Segmentation mask tensor type."""

231

232

class Video(torch.Tensor):

233

"""Video tensor type for temporal data."""

234

```

235

236

[TV Tensors](./tv_tensors.md)

237

238

## Version Information

239

240

```python { .api }

241

__version__: str # TorchVision version string (0.23.0)

242

git_version: str # Git commit hash

243

```