0
# TorchVision
1
2
TorchVision is a computer vision library for PyTorch that provides datasets, model architectures, and computer vision transforms. It offers a comprehensive toolkit for building computer vision applications with pre-trained models, data loading utilities, and image/video processing capabilities.
3
4
## Package Information
5
6
- **Package Name**: torchvision
7
- **Language**: Python
8
- **Installation**: `pip install torchvision`
9
- **Version**: 0.23.0
10
11
## Core Imports
12
13
```python
14
import torchvision
15
from torchvision import datasets, models, transforms, utils, io, ops, tv_tensors
16
```
17
18
Common patterns:
19
20
```python
21
import torchvision.transforms as transforms
22
import torchvision.models as models
23
from torchvision.datasets import CIFAR10, ImageNet
24
```
25
26
## Basic Usage
27
28
```python
29
import torch
30
import torchvision.transforms as transforms
31
from torchvision import models, datasets
32
from torch.utils.data import DataLoader
33
34
# Load a pre-trained model
35
model = models.resnet50(weights='DEFAULT')
36
model.eval()
37
38
# Create transform pipeline
39
transform = transforms.Compose([
40
transforms.Resize(256),
41
transforms.CenterCrop(224),
42
transforms.ToTensor(),
43
transforms.Normalize(mean=[0.485, 0.456, 0.406],
44
std=[0.229, 0.224, 0.225])
45
])
46
47
# Load dataset
48
dataset = datasets.CIFAR10(root='./data', train=False,
49
download=True, transform=transform)
50
dataloader = DataLoader(dataset, batch_size=32, shuffle=False)
51
52
# Inference
53
with torch.no_grad():
54
for images, labels in dataloader:
55
outputs = model(images)
56
predictions = torch.argmax(outputs, dim=1)
57
break
58
```
59
60
## Architecture
61
62
TorchVision is organized into several key modules:
63
64
- **Models**: Pre-trained neural networks for classification, detection, segmentation, and video tasks
65
- **Datasets**: Standard computer vision datasets with automatic download and preprocessing
66
- **Transforms**: Image and video processing operations with v1/v2 APIs for different data types
67
- **Utils**: Visualization utilities and tensor operations
68
- **I/O**: Image and video reading/writing operations with format support
69
- **Ops**: Low-level operations for object detection, segmentation, and custom layers
70
- **TV Tensors**: Enhanced tensor types that preserve metadata through transformations
71
72
## Capabilities
73
74
### Global Configuration
75
76
Core TorchVision configuration functions for backend management.
77
78
```python { .api }
79
def set_image_backend(backend: str) -> None:
80
"""Set the image loading backend ('PIL' or 'accimage')."""
81
82
def get_image_backend() -> str:
83
"""Get the current image backend."""
84
85
def set_video_backend(backend: str) -> None:
86
"""Set the video decoding backend ('pyav', 'video_reader', or 'cuda')."""
87
88
def get_video_backend() -> str:
89
"""Get the current video backend."""
90
91
def disable_beta_transforms_warning() -> None:
92
"""Disable beta transforms warning (legacy compatibility function)."""
93
```
94
95
### Datasets
96
97
Comprehensive collection of computer vision datasets with automatic downloading and preprocessing capabilities. Includes image classification, object detection, segmentation, and video datasets.
98
99
```python { .api }
100
class VisionDataset:
101
"""Base class for all vision datasets."""
102
103
class ImageFolder(VisionDataset):
104
"""Data loader for image classification datasets in folder format."""
105
106
class CIFAR10(VisionDataset):
107
"""CIFAR-10 dataset."""
108
109
class ImageNet(VisionDataset):
110
"""ImageNet dataset."""
111
112
class CocoDetection(VisionDataset):
113
"""COCO dataset for object detection."""
114
```
115
116
[Datasets](./datasets.md)
117
118
### Models
119
120
Pre-trained neural network models for various computer vision tasks including classification, object detection, instance segmentation, semantic segmentation, and video understanding.
121
122
```python { .api }
123
def get_model(name: str, **config) -> torch.nn.Module:
124
"""Get model by name with configuration."""
125
126
def list_models() -> list[str]:
127
"""List all available models."""
128
129
def resnet50(weights=None, progress: bool = True, **kwargs) -> torch.nn.Module:
130
"""ResNet-50 model."""
131
132
def fasterrcnn_resnet50_fpn(weights=None, progress: bool = True, **kwargs) -> torch.nn.Module:
133
"""Faster R-CNN with ResNet-50-FPN backbone."""
134
```
135
136
[Models](./models.md)
137
138
### Transforms
139
140
Image and video preprocessing and augmentation operations. Includes both v1 (PIL/tensor) and v2 (multi-tensor) APIs for different data types.
141
142
```python { .api }
143
class Compose:
144
"""Composes several transforms together."""
145
146
class Resize:
147
"""Resize image to given size."""
148
149
class ToTensor:
150
"""Convert PIL Image or numpy array to tensor."""
151
152
class Normalize:
153
"""Normalize tensor with mean and std."""
154
155
class RandomHorizontalFlip:
156
"""Randomly flip image horizontally."""
157
```
158
159
[Transforms](./transforms.md)
160
161
### Utils
162
163
Visualization utilities and tensor operations for working with images, bounding boxes, masks, and keypoints.
164
165
```python { .api }
166
def make_grid(tensor, nrow: int = 8, padding: int = 2, normalize: bool = False):
167
"""Make a grid of images."""
168
169
def save_image(tensor, fp, nrow: int = 8, padding: int = 2, normalize: bool = False):
170
"""Save tensor as image file."""
171
172
def draw_bounding_boxes(image, boxes, labels=None, colors=None, fill: bool = False, width: int = 1):
173
"""Draw bounding boxes on image."""
174
```
175
176
[Utils](./utils.md)
177
178
### I/O Operations
179
180
Image and video input/output operations with support for multiple formats and backends.
181
182
```python { .api }
183
def read_image(path: str, mode: str = 'RGB'):
184
"""Read image file to tensor."""
185
186
def write_jpeg(input, filename: str, quality: int = 75):
187
"""Write tensor as JPEG file."""
188
189
def read_video(filename: str, start_pts: float = 0, end_pts=None, pts_unit: str = 'pts'):
190
"""Read video file."""
191
192
class VideoReader:
193
"""Video reader for streaming video data."""
194
```
195
196
[I/O Operations](./io.md)
197
198
### Operations
199
200
Low-level operations for object detection, segmentation, and specialized neural network layers.
201
202
```python { .api }
203
def nms(boxes, scores, iou_threshold: float):
204
"""Non-maximum suppression."""
205
206
def roi_align(input, boxes, output_size, spatial_scale: float = 1.0, sampling_ratio: int = -1, aligned: bool = False):
207
"""RoI Align operation."""
208
209
def box_iou(boxes1, boxes2):
210
"""Calculate IoU between box sets."""
211
212
class FeaturePyramidNetwork(torch.nn.Module):
213
"""Feature Pyramid Network."""
214
```
215
216
[Operations](./ops.md)
217
218
### TV Tensors
219
220
Enhanced tensor types that preserve metadata and semantics through transformations, supporting images, videos, bounding boxes, masks, and keypoints.
221
222
```python { .api }
223
class Image(torch.Tensor):
224
"""Image tensor type with metadata."""
225
226
class BoundingBoxes(torch.Tensor):
227
"""Bounding box tensor with format and canvas size."""
228
229
class Mask(torch.Tensor):
230
"""Segmentation mask tensor type."""
231
232
class Video(torch.Tensor):
233
"""Video tensor type for temporal data."""
234
```
235
236
[TV Tensors](./tv_tensors.md)
237
238
## Version Information
239
240
```python { .api }
241
__version__: str # TorchVision version string (0.23.0)
242
git_version: str # Git commit hash
243
```