0
# Specialized Models
1
2
Specialized model implementations for specific computer vision tasks including open-vocabulary detection, promptable segmentation, efficient segmentation, transformer-based detection, and neural architecture search.
3
4
## Capabilities
5
6
### YOLOWorld - Open Vocabulary Detection
7
8
YOLO-World model enables open-vocabulary object detection, allowing detection of custom classes without retraining.
9
10
```python { .api }
11
class YOLOWorld:
12
def __init__(self, model="yolov8s-world.pt", verbose=False):
13
"""
14
Initialize YOLOv8-World model.
15
16
Parameters:
17
- model (str | Path): Path to pre-trained model file
18
- verbose (bool): Enable verbose output during initialization
19
"""
20
21
def set_classes(self, classes: List[str]):
22
"""
23
Set custom classes for detection.
24
25
Parameters:
26
- classes (List[str]): List of class names to detect
27
"""
28
```
29
30
**Usage Examples:**
31
32
```python
33
from ultralytics import YOLOWorld
34
35
# Initialize YOLOWorld model
36
model = YOLOWorld("yolov8s-world.pt")
37
38
# Set custom classes
39
model.set_classes(["person", "bicycle", "car", "motorcycle"])
40
41
# Perform detection with custom classes
42
results = model("image.jpg")
43
44
# Update classes dynamically
45
model.set_classes(["apple", "banana", "orange"])
46
results = model("fruit_image.jpg")
47
```
48
49
### SAM - Segment Anything Model
50
51
Segment Anything Model provides promptable segmentation using bounding boxes, points, or text prompts.
52
53
```python { .api }
54
class SAM:
55
def __init__(self, model="sam_b.pt"):
56
"""
57
Initialize SAM model.
58
59
Parameters:
60
- model (str | Path): Path to SAM model file
61
"""
62
63
def predict(self, source, stream=False, bboxes=None, points=None, labels=None, **kwargs) -> List[Results]:
64
"""
65
Perform segmentation with prompts.
66
67
Parameters:
68
- source: Input image source
69
- stream (bool): Stream processing mode
70
- bboxes (List[List[float]], optional): Bounding box prompts [[x1, y1, x2, y2], ...]
71
- points (List[List[float]], optional): Point prompts [[x, y], ...]
72
- labels (List[int], optional): Point labels (1 for foreground, 0 for background)
73
- **kwargs: Additional arguments
74
75
Returns:
76
List[Results]: Segmentation results
77
"""
78
79
def __call__(self, source=None, stream=False, bboxes=None, points=None, labels=None, **kwargs) -> List[Results]:
80
"""Callable interface for predict method."""
81
```
82
83
**Usage Examples:**
84
85
```python
86
from ultralytics import SAM
87
88
# Initialize SAM model
89
model = SAM("sam_b.pt") # or sam_l.pt, sam_h.pt for larger models
90
91
# Segment with bounding box prompts
92
results = model("image.jpg", bboxes=[[100, 100, 200, 200]])
93
94
# Segment with point prompts
95
results = model("image.jpg", points=[[150, 150]], labels=[1])
96
97
# Combine prompts
98
results = model("image.jpg",
99
bboxes=[[100, 100, 200, 200]],
100
points=[[150, 150], [180, 180]],
101
labels=[1, 0])
102
103
# Process results
104
for r in results:
105
masks = r.masks.data # Segmentation masks
106
r.show() # Display results
107
```
108
109
### FastSAM - Fast Segment Anything
110
111
FastSAM provides efficient segmentation with significantly faster inference than SAM while maintaining competitive accuracy.
112
113
```python { .api }
114
class FastSAM:
115
def __init__(self, model="FastSAM-x.pt"):
116
"""
117
Initialize FastSAM model.
118
119
Parameters:
120
- model (str | Path): Path to FastSAM model file
121
"""
122
123
def predict(self, source, **kwargs) -> List[Results]:
124
"""
125
Perform fast segmentation.
126
127
Parameters:
128
- source: Input image source
129
- **kwargs: Additional arguments
130
131
Returns:
132
List[Results]: Segmentation results
133
"""
134
```
135
136
**Usage Examples:**
137
138
```python
139
from ultralytics import FastSAM
140
141
# Initialize FastSAM model
142
model = FastSAM("FastSAM-x.pt") # or FastSAM-s.pt for smaller/faster model
143
144
# Perform segmentation
145
results = model("image.jpg")
146
147
# With custom parameters
148
results = model("image.jpg", conf=0.4, iou=0.9)
149
150
# Process results
151
for r in results:
152
masks = r.masks.data # Segmentation masks
153
boxes = r.boxes.data # Bounding boxes
154
r.save("output.jpg") # Save results
155
```
156
157
### RTDETR - Real-Time Detection Transformer
158
159
Real-Time Detection Transformer provides state-of-the-art object detection using transformer architecture.
160
161
```python { .api }
162
class RTDETR:
163
def __init__(self, model="rtdetr-l.pt"):
164
"""
165
Initialize RTDETR model.
166
167
Parameters:
168
- model (str | Path): Path to RTDETR model file
169
"""
170
```
171
172
**Usage Examples:**
173
174
```python
175
from ultralytics import RTDETR
176
177
# Initialize RTDETR model
178
model = RTDETR("rtdetr-l.pt") # or rtdetr-x.pt for extra large
179
180
# Perform detection
181
results = model("image.jpg")
182
183
# With custom parameters
184
results = model("image.jpg", conf=0.5, imgsz=640)
185
```
186
187
### NAS - Neural Architecture Search
188
189
Neural Architecture Search models for automated architecture optimization.
190
191
```python { .api }
192
class NAS:
193
def __init__(self, model="yolo_nas_s.pt"):
194
"""
195
Initialize NAS model.
196
197
Parameters:
198
- model (str | Path): Path to NAS model file
199
"""
200
```
201
202
**Usage Examples:**
203
204
```python
205
from ultralytics import NAS
206
207
# Initialize NAS model
208
model = NAS("yolo_nas_s.pt") # or yolo_nas_m.pt, yolo_nas_l.pt
209
210
# Perform detection
211
results = model("image.jpg")
212
213
# Training (if supported)
214
model.train(data="coco8.yaml", epochs=100)
215
```
216
217
## Model Comparison
218
219
| Model | Primary Use Case | Speed | Accuracy | Prompt Support |
220
|-------|------------------|-------|----------|---------------|
221
| YOLO | General detection/segmentation | Fast | High | No |
222
| YOLOWorld | Open vocabulary detection | Fast | High | Text classes |
223
| SAM | Precise segmentation | Slow | Very High | Bbox/Points |
224
| FastSAM | Fast segmentation | Very Fast | High | Limited |
225
| RTDETR | Transformer detection | Medium | Very High | No |
226
| NAS | Optimized architectures | Fast | High | No |
227
228
## Types
229
230
```python { .api }
231
from typing import List, Optional, Union
232
from ultralytics.engine.results import Results
233
234
# Common types for specialized models
235
BboxPrompt = List[float] # [x1, y1, x2, y2]
236
PointPrompt = List[float] # [x, y]
237
ClassList = List[str] # ["class1", "class2", ...]
238
LabelList = List[int] # [1, 0, 1, ...] (1=foreground, 0=background)
239
```