0
# Deep Neural Networks (DNN Module)
1
2
The `cv2.dnn` module provides a high-performance deep learning inference engine that supports multiple frameworks and model formats. It enables you to load pre-trained models and run inference for tasks like object detection, classification, semantic segmentation, and more, without requiring the original training frameworks.
3
4
OpenCV's DNN module is optimized for CPU and GPU inference, supports various backends (OpenCV, CUDA, OpenVINO), and can run models from popular frameworks like TensorFlow, PyTorch, Caffe, ONNX, and Darknet.
5
6
## Capabilities
7
8
### Loading Models
9
10
The DNN module provides multiple functions to load neural network models from different frameworks. The `readNet()` function can auto-detect the model format, while framework-specific functions offer more control.
11
12
```python { .api }
13
cv2.dnn.readNet(model, config=None, framework='')
14
```
15
Read a network model from file with automatic framework detection. This is the most convenient function as it automatically determines the framework based on file extensions.
16
17
**Parameters:**
18
- `model` (str): Path to the binary model file (e.g., `.caffemodel`, `.pb`, `.onnx`, `.weights`)
19
- `config` (str, optional): Path to the configuration file (e.g., `.prototxt` for Caffe, `.pbtxt` for TensorFlow, `.cfg` for Darknet)
20
- `framework` (str, optional): Explicit framework name to use if auto-detection fails
21
22
**Returns:** `Net` object representing the loaded neural network
23
24
**Example:**
25
```python
26
# Auto-detect framework
27
net = cv2.dnn.readNet('model.onnx')
28
29
# Load Darknet YOLO model
30
net = cv2.dnn.readNet('yolov3.weights', 'yolov3.cfg')
31
32
# Load TensorFlow model
33
net = cv2.dnn.readNet('frozen_graph.pb', 'graph.pbtxt')
34
```
35
36
---
37
38
```python { .api }
39
cv2.dnn.readNetFromCaffe(prototxt, caffeModel=None)
40
```
41
Read a Caffe model from prototxt and caffemodel files. Caffe is commonly used for CNN-based models.
42
43
**Parameters:**
44
- `prototxt` (str): Path to the `.prototxt` file (network structure)
45
- `caffeModel` (str, optional): Path to the `.caffemodel` file (trained weights)
46
47
**Returns:** `Net` object
48
49
**Example:**
50
```python
51
# Load pre-trained face detector
52
net = cv2.dnn.readNetFromCaffe(
53
'deploy.prototxt',
54
'res10_300x300_ssd_iter_140000.caffemodel'
55
)
56
```
57
58
---
59
60
```python { .api }
61
cv2.dnn.readNetFromTensorflow(model, config=None)
62
```
63
Read a TensorFlow model from frozen graph or saved model.
64
65
**Parameters:**
66
- `model` (str): Path to the `.pb` file (frozen graph)
67
- `config` (str, optional): Path to the `.pbtxt` file (text graph proto)
68
69
**Returns:** `Net` object
70
71
**Example:**
72
```python
73
# Load TensorFlow object detection model
74
net = cv2.dnn.readNetFromTensorflow(
75
'frozen_inference_graph.pb',
76
'graph.pbtxt'
77
)
78
```
79
80
---
81
82
```python { .api }
83
cv2.dnn.readNetFromONNX(onnxFile)
84
```
85
Read a model from ONNX format. ONNX is an open format supporting many frameworks.
86
87
**Parameters:**
88
- `onnxFile` (str): Path to the `.onnx` model file
89
90
**Returns:** `Net` object
91
92
**Example:**
93
```python
94
# Load ONNX model
95
net = cv2.dnn.readNetFromONNX('model.onnx')
96
```
97
98
---
99
100
```python { .api }
101
cv2.dnn.readNetFromDarknet(cfgFile, darknetModel=None)
102
```
103
Read a Darknet model (YOLO models). Darknet is the framework used for YOLO object detection.
104
105
**Parameters:**
106
- `cfgFile` (str): Path to the `.cfg` configuration file
107
- `darknetModel` (str, optional): Path to the `.weights` file
108
109
**Returns:** `Net` object
110
111
**Example:**
112
```python
113
# Load YOLOv4 model
114
net = cv2.dnn.readNetFromDarknet(
115
'yolov4.cfg',
116
'yolov4.weights'
117
)
118
```
119
120
---
121
122
```python { .api }
123
cv2.dnn.readNetFromTorch(model, isBinary=True)
124
```
125
Read a Torch model from file. Supports legacy Torch7 models.
126
127
**Parameters:**
128
- `model` (str): Path to the Torch model file
129
- `isBinary` (bool): Whether the model is in binary format (default: True)
130
131
**Returns:** `Net` object
132
133
---
134
135
```python { .api }
136
cv2.dnn.readNetFromModelOptimizer(xml, bin)
137
```
138
Read a model from OpenVINO Model Optimizer format (Intel).
139
140
**Parameters:**
141
- `xml` (str): Path to the `.xml` file (model structure)
142
- `bin` (str): Path to the `.bin` file (weights)
143
144
**Returns:** `Net` object
145
146
**Example:**
147
```python
148
# Load OpenVINO IR model
149
net = cv2.dnn.readNetFromModelOptimizer(
150
'model.xml',
151
'model.bin'
152
)
153
```
154
155
### Preprocessing
156
157
Before feeding images to neural networks, they typically need to be preprocessed into a specific format called a "blob". The blob functions handle resizing, scaling, mean subtraction, and channel swapping.
158
159
```python { .api }
160
cv2.dnn.blobFromImage(image, scalefactor=1.0, size=(0, 0), mean=(0, 0, 0),
161
swapRB=False, crop=False, ddepth=cv2.CV_32F)
162
```
163
Create a 4-dimensional blob from a single image. This is the most commonly used preprocessing function for deep learning models.
164
165
**Parameters:**
166
- `image` (numpy.ndarray): Input image (BGR format)
167
- `scalefactor` (float): Multiplier for image values (e.g., 1/255.0 to normalize to [0,1])
168
- `size` (tuple): Target spatial size (width, height) for the output image
169
- `mean` (tuple): Scalar with mean values to subtract from channels (e.g., (104.0, 177.0, 123.0))
170
- `swapRB` (bool): If True, swap Red and Blue channels (convert BGR to RGB)
171
- `crop` (bool): If True, crop image after resize; if False, just resize
172
- `ddepth` (int): Output blob depth (default: CV_32F for float32)
173
174
**Returns:** 4D numpy array with shape (1, channels, height, width) in NCHW format
175
176
**Example:**
177
```python
178
# Preprocess image for classification model
179
blob = cv2.dnn.blobFromImage(
180
image,
181
scalefactor=1/255.0,
182
size=(224, 224),
183
mean=(0, 0, 0),
184
swapRB=True,
185
crop=False
186
)
187
188
# Preprocess for face detection (Caffe SSD)
189
blob = cv2.dnn.blobFromImage(
190
image,
191
scalefactor=1.0,
192
size=(300, 300),
193
mean=(104.0, 177.0, 123.0),
194
swapRB=False,
195
crop=False
196
)
197
```
198
199
---
200
201
```python { .api }
202
cv2.dnn.blobFromImages(images, scalefactor=1.0, size=(0, 0), mean=(0, 0, 0),
203
swapRB=False, crop=False, ddepth=cv2.CV_32F)
204
```
205
Create a 4-dimensional blob from multiple images for batch processing.
206
207
**Parameters:**
208
- `images` (list of numpy.ndarray): List of input images
209
- Other parameters same as `blobFromImage()`
210
211
**Returns:** 4D numpy array with shape (batch_size, channels, height, width)
212
213
**Example:**
214
```python
215
# Process multiple images in a batch
216
images = [img1, img2, img3]
217
blob = cv2.dnn.blobFromImages(
218
images,
219
scalefactor=1/255.0,
220
size=(224, 224),
221
swapRB=True
222
)
223
```
224
225
---
226
227
```python { .api }
228
cv2.dnn.imagesFromBlob(blob)
229
```
230
Extract images from a blob after network processing. Useful for visualization or debugging.
231
232
**Parameters:**
233
- `blob` (numpy.ndarray): 4D blob array
234
235
**Returns:** List of images in standard OpenCV format
236
237
**Example:**
238
```python
239
# Convert blob back to images
240
images = cv2.dnn.imagesFromBlob(blob)
241
for img in images:
242
cv2.imshow('Image', img)
243
```
244
245
### Neural Network Operations
246
247
The `Net` class provides methods for running inference, configuring backends, and querying network structure.
248
249
```python { .api }
250
Net.setInput(blob, name='', scalefactor=1.0, mean=(0, 0, 0))
251
```
252
Set the input blob for the network. This prepares the data for forward pass.
253
254
**Parameters:**
255
- `blob` (numpy.ndarray): 4D input blob (typically from `blobFromImage()`)
256
- `name` (str): Name of the input layer (empty string for default)
257
- `scalefactor` (float): Optional additional scaling
258
- `mean` (tuple): Optional additional mean subtraction
259
260
**Returns:** None
261
262
**Example:**
263
```python
264
net.setInput(blob)
265
# Or specify input layer name
266
net.setInput(blob, name='input_1')
267
```
268
269
---
270
271
```python { .api }
272
Net.forward(outputName=None)
273
```
274
Run forward pass to compute output of the specified layer. This performs the actual inference.
275
276
**Parameters:**
277
- `outputName` (str, optional): Name of the output layer. If None, returns outputs from all unconnected output layers
278
279
**Returns:** numpy.ndarray or list of numpy arrays containing network output(s)
280
281
**Example:**
282
```python
283
# Get output from final layer
284
output = net.forward()
285
286
# Get output from specific layer
287
output = net.forward('detection_out')
288
289
# Get outputs from multiple layers
290
layer_names = net.getUnconnectedOutLayersNames()
291
outputs = net.forward(layer_names)
292
```
293
294
---
295
296
```python { .api }
297
Net.forwardAsync(outputName=None)
298
```
299
Run asynchronous forward pass. Useful for pipelining and concurrent processing.
300
301
**Parameters:**
302
- `outputName` (str, optional): Name of the output layer
303
304
**Returns:** Async handle for retrieving results
305
306
---
307
308
```python { .api }
309
Net.setPreferableBackend(backendId)
310
```
311
Set the computation backend for the network. Different backends offer different performance characteristics.
312
313
**Parameters:**
314
- `backendId` (int): Backend identifier (see Backend Constants section)
315
316
**Returns:** None
317
318
**Example:**
319
```python
320
# Use OpenCV's implementation (CPU)
321
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
322
323
# Use CUDA for GPU acceleration
324
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
325
326
# Use Intel's OpenVINO
327
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_INFERENCE_ENGINE)
328
```
329
330
---
331
332
```python { .api }
333
Net.setPreferableTarget(targetId)
334
```
335
Set the computation target device (CPU, GPU, etc.). Must be called after `setPreferableBackend()`.
336
337
**Parameters:**
338
- `targetId` (int): Target device identifier (see Target Constants section)
339
340
**Returns:** None
341
342
**Example:**
343
```python
344
# Use CPU
345
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)
346
347
# Use GPU with CUDA
348
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
349
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
350
351
# Use GPU with FP16 precision for faster inference
352
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
353
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA_FP16)
354
```
355
356
---
357
358
```python { .api }
359
Net.getLayerNames()
360
```
361
Get names of all layers in the network. Useful for debugging and understanding network structure.
362
363
**Returns:** List of strings containing layer names
364
365
**Example:**
366
```python
367
layer_names = net.getLayerNames()
368
print(f"Network has {len(layer_names)} layers")
369
for name in layer_names[:5]:
370
print(name)
371
```
372
373
---
374
375
```python { .api }
376
Net.getUnconnectedOutLayers()
377
```
378
Get indices of output layers (layers without consumers). These are typically the final layers you want to retrieve.
379
380
**Returns:** List of integers representing output layer indices
381
382
**Example:**
383
```python
384
output_layers = net.getUnconnectedOutLayers()
385
print(f"Output layer indices: {output_layers}")
386
```
387
388
---
389
390
```python { .api }
391
Net.getUnconnectedOutLayersNames()
392
```
393
Get names of output layers. More convenient than using indices.
394
395
**Returns:** List of strings containing output layer names
396
397
**Example:**
398
```python
399
# Get outputs from all output layers (e.g., for YOLO)
400
output_layer_names = net.getUnconnectedOutLayersNames()
401
outputs = net.forward(output_layer_names)
402
```
403
404
### Post-processing
405
406
After running inference, post-processing is often needed to filter and refine detection results. Non-Maximum Suppression (NMS) is the most common post-processing technique.
407
408
```python { .api }
409
cv2.dnn.NMSBoxes(bboxes, scores, score_threshold, nms_threshold,
410
eta=1.0, top_k=0)
411
```
412
Apply Non-Maximum Suppression (NMS) to bounding boxes. NMS filters out overlapping detections, keeping only the most confident ones.
413
414
**Parameters:**
415
- `bboxes` (list): List of bounding boxes, each as [x, y, width, height]
416
- `scores` (list): List of confidence scores corresponding to each box
417
- `score_threshold` (float): Minimum score threshold to consider a detection
418
- `nms_threshold` (float): IoU (Intersection over Union) threshold for NMS (typically 0.3-0.5)
419
- `eta` (float): Coefficient for adaptive NMS (default: 1.0)
420
- `top_k` (int): Maximum number of boxes to keep (0 = no limit)
421
422
**Returns:** List of indices of boxes to keep after NMS
423
424
**Example:**
425
```python
426
# Apply NMS to detections
427
boxes = [[10, 10, 50, 50], [12, 12, 48, 48], [100, 100, 60, 60]]
428
scores = [0.9, 0.85, 0.95]
429
430
indices = cv2.dnn.NMSBoxes(
431
boxes,
432
scores,
433
score_threshold=0.5,
434
nms_threshold=0.4
435
)
436
437
# Keep only selected boxes
438
kept_boxes = [boxes[i] for i in indices]
439
kept_scores = [scores[i] for i in indices]
440
```
441
442
---
443
444
```python { .api }
445
cv2.dnn.NMSBoxesRotated(bboxes, scores, score_threshold, nms_threshold,
446
eta=1.0, top_k=0)
447
```
448
Apply NMS to rotated bounding boxes. Used for oriented object detection where boxes can be at any angle.
449
450
**Parameters:**
451
- `bboxes` (list): List of rotated boxes, each as ((center_x, center_y), (width, height), angle)
452
- Other parameters same as `NMSBoxes()`
453
454
**Returns:** List of indices of boxes to keep
455
456
**Example:**
457
```python
458
# Rotated boxes for text detection
459
rotated_boxes = [
460
((100, 100), (50, 20), 30.0), # center, size, angle
461
((150, 150), (60, 25), -15.0)
462
]
463
scores = [0.9, 0.85]
464
465
indices = cv2.dnn.NMSBoxesRotated(
466
rotated_boxes,
467
scores,
468
score_threshold=0.5,
469
nms_threshold=0.3
470
)
471
```
472
473
### Backend and Target Constants
474
475
Backend constants specify which computational backend to use:
476
477
```python { .api }
478
# Backend constants
479
cv2.dnn.DNN_BACKEND_DEFAULT # Let OpenCV choose
480
cv2.dnn.DNN_BACKEND_HALIDE # Halide backend
481
cv2.dnn.DNN_BACKEND_INFERENCE_ENGINE # Intel OpenVINO
482
cv2.dnn.DNN_BACKEND_OPENCV # Pure OpenCV implementation
483
cv2.dnn.DNN_BACKEND_VKCOM # Vulkan compute
484
cv2.dnn.DNN_BACKEND_CUDA # NVIDIA CUDA
485
```
486
487
Target constants specify which device to run on:
488
489
```python { .api }
490
# Target constants
491
cv2.dnn.DNN_TARGET_CPU # CPU execution
492
cv2.dnn.DNN_TARGET_OPENCL # OpenCL (GPU)
493
cv2.dnn.DNN_TARGET_OPENCL_FP16 # OpenCL with FP16 precision
494
cv2.dnn.DNN_TARGET_MYRIAD # Intel Movidius
495
cv2.dnn.DNN_TARGET_VULKAN # Vulkan API
496
cv2.dnn.DNN_TARGET_CUDA # NVIDIA CUDA GPU
497
cv2.dnn.DNN_TARGET_CUDA_FP16 # NVIDIA CUDA with FP16
498
```
499
500
**Usage example:**
501
```python
502
# Configure for optimal CPU performance
503
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
504
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)
505
506
# Configure for NVIDIA GPU with FP16
507
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
508
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA_FP16)
509
510
# Configure for Intel hardware with OpenVINO
511
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_INFERENCE_ENGINE)
512
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)
513
```
514
515
## Practical Examples
516
517
### Example 1: Image Classification
518
519
```python
520
import cv2
521
import numpy as np
522
523
# Load a pre-trained classification model (e.g., MobileNet)
524
net = cv2.dnn.readNetFromCaffe('mobilenet_deploy.prototxt',
525
'mobilenet.caffemodel')
526
527
# Read and preprocess image
528
image = cv2.imread('image.jpg')
529
blob = cv2.dnn.blobFromImage(image,
530
scalefactor=1.0,
531
size=(224, 224),
532
mean=(104.0, 117.0, 123.0),
533
swapRB=False,
534
crop=False)
535
536
# Run inference
537
net.setInput(blob)
538
predictions = net.forward()
539
540
# Get top prediction
541
class_id = np.argmax(predictions[0])
542
confidence = predictions[0][class_id]
543
544
print(f"Predicted class: {class_id}, Confidence: {confidence:.2f}")
545
```
546
547
### Example 2: Object Detection with YOLO
548
549
```python
550
import cv2
551
import numpy as np
552
553
# Load YOLO model
554
net = cv2.dnn.readNetFromDarknet('yolov4.cfg', 'yolov4.weights')
555
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
556
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
557
558
# Read image
559
image = cv2.imread('image.jpg')
560
height, width = image.shape[:2]
561
562
# Preprocess
563
blob = cv2.dnn.blobFromImage(image,
564
scalefactor=1/255.0,
565
size=(416, 416),
566
swapRB=True,
567
crop=False)
568
569
# Run inference
570
net.setInput(blob)
571
output_layers = net.getUnconnectedOutLayersNames()
572
outputs = net.forward(output_layers)
573
574
# Post-process detections
575
boxes = []
576
confidences = []
577
class_ids = []
578
579
for output in outputs:
580
for detection in output:
581
scores = detection[5:]
582
class_id = np.argmax(scores)
583
confidence = scores[class_id]
584
585
if confidence > 0.5:
586
# Scale bounding box back to original image
587
center_x = int(detection[0] * width)
588
center_y = int(detection[1] * height)
589
w = int(detection[2] * width)
590
h = int(detection[3] * height)
591
592
# Rectangle coordinates
593
x = int(center_x - w / 2)
594
y = int(center_y - h / 2)
595
596
boxes.append([x, y, w, h])
597
confidences.append(float(confidence))
598
class_ids.append(class_id)
599
600
# Apply NMS
601
indices = cv2.dnn.NMSBoxes(boxes, confidences,
602
score_threshold=0.5,
603
nms_threshold=0.4)
604
605
# Draw results
606
for i in indices:
607
box = boxes[i]
608
x, y, w, h = box
609
cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
610
cv2.putText(image, f'Class {class_ids[i]}: {confidences[i]:.2f}',
611
(x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
612
613
cv2.imshow('Detections', image)
614
cv2.waitKey(0)
615
```
616
617
### Example 3: Face Detection with SSD
618
619
```python
620
import cv2
621
622
# Load pre-trained face detection model
623
net = cv2.dnn.readNetFromCaffe(
624
'deploy.prototxt',
625
'res10_300x300_ssd_iter_140000.caffemodel'
626
)
627
628
# Read image
629
image = cv2.imread('faces.jpg')
630
h, w = image.shape[:2]
631
632
# Preprocess
633
blob = cv2.dnn.blobFromImage(
634
cv2.resize(image, (300, 300)),
635
scalefactor=1.0,
636
size=(300, 300),
637
mean=(104.0, 177.0, 123.0)
638
)
639
640
# Detect faces
641
net.setInput(blob)
642
detections = net.forward()
643
644
# Process detections
645
for i in range(detections.shape[2]):
646
confidence = detections[0, 0, i, 2]
647
648
if confidence > 0.5:
649
# Get bounding box
650
box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
651
(x1, y1, x2, y2) = box.astype(int)
652
653
# Draw rectangle
654
cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)
655
text = f'{confidence * 100:.1f}%'
656
cv2.putText(image, text, (x1, y1 - 10),
657
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
658
659
cv2.imshow('Face Detection', image)
660
cv2.waitKey(0)
661
```
662
663
### Example 4: Using ONNX Models
664
665
```python
666
import cv2
667
import numpy as np
668
669
# Load ONNX model (e.g., exported from PyTorch)
670
net = cv2.dnn.readNetFromONNX('model.onnx')
671
672
# Optional: Use GPU
673
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
674
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
675
676
# Read and preprocess image
677
image = cv2.imread('image.jpg')
678
blob = cv2.dnn.blobFromImage(image,
679
scalefactor=1/255.0,
680
size=(640, 640),
681
mean=(0, 0, 0),
682
swapRB=True,
683
crop=False)
684
685
# Run inference
686
net.setInput(blob)
687
output = net.forward()
688
689
print(f"Output shape: {output.shape}")
690
# Further processing depends on model architecture
691
```
692
693
### Example 5: Batch Processing
694
695
```python
696
import cv2
697
import numpy as np
698
699
# Load model
700
net = cv2.dnn.readNetFromCaffe('model.prototxt', 'model.caffemodel')
701
702
# Load multiple images
703
images = [cv2.imread(f'image{i}.jpg') for i in range(5)]
704
705
# Create batch blob
706
blob = cv2.dnn.blobFromImages(images,
707
scalefactor=1/255.0,
708
size=(224, 224),
709
mean=(0, 0, 0),
710
swapRB=True)
711
712
# Process batch
713
net.setInput(blob)
714
predictions = net.forward()
715
716
# Results for each image
717
for i, pred in enumerate(predictions):
718
class_id = np.argmax(pred)
719
confidence = pred[class_id]
720
print(f"Image {i}: Class {class_id}, Confidence: {confidence:.2f}")
721
```
722