or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

aruco.mdcamera-calibration.mdcomputational-photography.mdcontours-shapes.mdcore-operations.mddnn.mdfeature-detection.mdgui-drawing.mdimage-processing.mdimage-video-io.mdindex.mdmachine-learning.mdobject-detection.mdtask-log.mdvideo-analysis.md

dnn.mddocs/

0

# Deep Neural Networks (DNN Module)

1

2

The `cv2.dnn` module provides a high-performance deep learning inference engine that supports multiple frameworks and model formats. It enables you to load pre-trained models and run inference for tasks like object detection, classification, semantic segmentation, and more, without requiring the original training frameworks.

3

4

OpenCV's DNN module is optimized for CPU and GPU inference, supports various backends (OpenCV, CUDA, OpenVINO), and can run models from popular frameworks like TensorFlow, PyTorch, Caffe, ONNX, and Darknet.

5

6

## Capabilities

7

8

### Loading Models

9

10

The DNN module provides multiple functions to load neural network models from different frameworks. The `readNet()` function can auto-detect the model format, while framework-specific functions offer more control.

11

12

```python { .api }

13

cv2.dnn.readNet(model, config=None, framework='')

14

```

15

Read a network model from file with automatic framework detection. This is the most convenient function as it automatically determines the framework based on file extensions.

16

17

**Parameters:**

18

- `model` (str): Path to the binary model file (e.g., `.caffemodel`, `.pb`, `.onnx`, `.weights`)

19

- `config` (str, optional): Path to the configuration file (e.g., `.prototxt` for Caffe, `.pbtxt` for TensorFlow, `.cfg` for Darknet)

20

- `framework` (str, optional): Explicit framework name to use if auto-detection fails

21

22

**Returns:** `Net` object representing the loaded neural network

23

24

**Example:**

25

```python

26

# Auto-detect framework

27

net = cv2.dnn.readNet('model.onnx')

28

29

# Load Darknet YOLO model

30

net = cv2.dnn.readNet('yolov3.weights', 'yolov3.cfg')

31

32

# Load TensorFlow model

33

net = cv2.dnn.readNet('frozen_graph.pb', 'graph.pbtxt')

34

```

35

36

---

37

38

```python { .api }

39

cv2.dnn.readNetFromCaffe(prototxt, caffeModel=None)

40

```

41

Read a Caffe model from prototxt and caffemodel files. Caffe is commonly used for CNN-based models.

42

43

**Parameters:**

44

- `prototxt` (str): Path to the `.prototxt` file (network structure)

45

- `caffeModel` (str, optional): Path to the `.caffemodel` file (trained weights)

46

47

**Returns:** `Net` object

48

49

**Example:**

50

```python

51

# Load pre-trained face detector

52

net = cv2.dnn.readNetFromCaffe(

53

'deploy.prototxt',

54

'res10_300x300_ssd_iter_140000.caffemodel'

55

)

56

```

57

58

---

59

60

```python { .api }

61

cv2.dnn.readNetFromTensorflow(model, config=None)

62

```

63

Read a TensorFlow model from frozen graph or saved model.

64

65

**Parameters:**

66

- `model` (str): Path to the `.pb` file (frozen graph)

67

- `config` (str, optional): Path to the `.pbtxt` file (text graph proto)

68

69

**Returns:** `Net` object

70

71

**Example:**

72

```python

73

# Load TensorFlow object detection model

74

net = cv2.dnn.readNetFromTensorflow(

75

'frozen_inference_graph.pb',

76

'graph.pbtxt'

77

)

78

```

79

80

---

81

82

```python { .api }

83

cv2.dnn.readNetFromONNX(onnxFile)

84

```

85

Read a model from ONNX format. ONNX is an open format supporting many frameworks.

86

87

**Parameters:**

88

- `onnxFile` (str): Path to the `.onnx` model file

89

90

**Returns:** `Net` object

91

92

**Example:**

93

```python

94

# Load ONNX model

95

net = cv2.dnn.readNetFromONNX('model.onnx')

96

```

97

98

---

99

100

```python { .api }

101

cv2.dnn.readNetFromDarknet(cfgFile, darknetModel=None)

102

```

103

Read a Darknet model (YOLO models). Darknet is the framework used for YOLO object detection.

104

105

**Parameters:**

106

- `cfgFile` (str): Path to the `.cfg` configuration file

107

- `darknetModel` (str, optional): Path to the `.weights` file

108

109

**Returns:** `Net` object

110

111

**Example:**

112

```python

113

# Load YOLOv4 model

114

net = cv2.dnn.readNetFromDarknet(

115

'yolov4.cfg',

116

'yolov4.weights'

117

)

118

```

119

120

---

121

122

```python { .api }

123

cv2.dnn.readNetFromTorch(model, isBinary=True)

124

```

125

Read a Torch model from file. Supports legacy Torch7 models.

126

127

**Parameters:**

128

- `model` (str): Path to the Torch model file

129

- `isBinary` (bool): Whether the model is in binary format (default: True)

130

131

**Returns:** `Net` object

132

133

---

134

135

```python { .api }

136

cv2.dnn.readNetFromModelOptimizer(xml, bin)

137

```

138

Read a model from OpenVINO Model Optimizer format (Intel).

139

140

**Parameters:**

141

- `xml` (str): Path to the `.xml` file (model structure)

142

- `bin` (str): Path to the `.bin` file (weights)

143

144

**Returns:** `Net` object

145

146

**Example:**

147

```python

148

# Load OpenVINO IR model

149

net = cv2.dnn.readNetFromModelOptimizer(

150

'model.xml',

151

'model.bin'

152

)

153

```

154

155

### Preprocessing

156

157

Before feeding images to neural networks, they typically need to be preprocessed into a specific format called a "blob". The blob functions handle resizing, scaling, mean subtraction, and channel swapping.

158

159

```python { .api }

160

cv2.dnn.blobFromImage(image, scalefactor=1.0, size=(0, 0), mean=(0, 0, 0),

161

swapRB=False, crop=False, ddepth=cv2.CV_32F)

162

```

163

Create a 4-dimensional blob from a single image. This is the most commonly used preprocessing function for deep learning models.

164

165

**Parameters:**

166

- `image` (numpy.ndarray): Input image (BGR format)

167

- `scalefactor` (float): Multiplier for image values (e.g., 1/255.0 to normalize to [0,1])

168

- `size` (tuple): Target spatial size (width, height) for the output image

169

- `mean` (tuple): Scalar with mean values to subtract from channels (e.g., (104.0, 177.0, 123.0))

170

- `swapRB` (bool): If True, swap Red and Blue channels (convert BGR to RGB)

171

- `crop` (bool): If True, crop image after resize; if False, just resize

172

- `ddepth` (int): Output blob depth (default: CV_32F for float32)

173

174

**Returns:** 4D numpy array with shape (1, channels, height, width) in NCHW format

175

176

**Example:**

177

```python

178

# Preprocess image for classification model

179

blob = cv2.dnn.blobFromImage(

180

image,

181

scalefactor=1/255.0,

182

size=(224, 224),

183

mean=(0, 0, 0),

184

swapRB=True,

185

crop=False

186

)

187

188

# Preprocess for face detection (Caffe SSD)

189

blob = cv2.dnn.blobFromImage(

190

image,

191

scalefactor=1.0,

192

size=(300, 300),

193

mean=(104.0, 177.0, 123.0),

194

swapRB=False,

195

crop=False

196

)

197

```

198

199

---

200

201

```python { .api }

202

cv2.dnn.blobFromImages(images, scalefactor=1.0, size=(0, 0), mean=(0, 0, 0),

203

swapRB=False, crop=False, ddepth=cv2.CV_32F)

204

```

205

Create a 4-dimensional blob from multiple images for batch processing.

206

207

**Parameters:**

208

- `images` (list of numpy.ndarray): List of input images

209

- Other parameters same as `blobFromImage()`

210

211

**Returns:** 4D numpy array with shape (batch_size, channels, height, width)

212

213

**Example:**

214

```python

215

# Process multiple images in a batch

216

images = [img1, img2, img3]

217

blob = cv2.dnn.blobFromImages(

218

images,

219

scalefactor=1/255.0,

220

size=(224, 224),

221

swapRB=True

222

)

223

```

224

225

---

226

227

```python { .api }

228

cv2.dnn.imagesFromBlob(blob)

229

```

230

Extract images from a blob after network processing. Useful for visualization or debugging.

231

232

**Parameters:**

233

- `blob` (numpy.ndarray): 4D blob array

234

235

**Returns:** List of images in standard OpenCV format

236

237

**Example:**

238

```python

239

# Convert blob back to images

240

images = cv2.dnn.imagesFromBlob(blob)

241

for img in images:

242

cv2.imshow('Image', img)

243

```

244

245

### Neural Network Operations

246

247

The `Net` class provides methods for running inference, configuring backends, and querying network structure.

248

249

```python { .api }

250

Net.setInput(blob, name='', scalefactor=1.0, mean=(0, 0, 0))

251

```

252

Set the input blob for the network. This prepares the data for forward pass.

253

254

**Parameters:**

255

- `blob` (numpy.ndarray): 4D input blob (typically from `blobFromImage()`)

256

- `name` (str): Name of the input layer (empty string for default)

257

- `scalefactor` (float): Optional additional scaling

258

- `mean` (tuple): Optional additional mean subtraction

259

260

**Returns:** None

261

262

**Example:**

263

```python

264

net.setInput(blob)

265

# Or specify input layer name

266

net.setInput(blob, name='input_1')

267

```

268

269

---

270

271

```python { .api }

272

Net.forward(outputName=None)

273

```

274

Run forward pass to compute output of the specified layer. This performs the actual inference.

275

276

**Parameters:**

277

- `outputName` (str, optional): Name of the output layer. If None, returns outputs from all unconnected output layers

278

279

**Returns:** numpy.ndarray or list of numpy arrays containing network output(s)

280

281

**Example:**

282

```python

283

# Get output from final layer

284

output = net.forward()

285

286

# Get output from specific layer

287

output = net.forward('detection_out')

288

289

# Get outputs from multiple layers

290

layer_names = net.getUnconnectedOutLayersNames()

291

outputs = net.forward(layer_names)

292

```

293

294

---

295

296

```python { .api }

297

Net.forwardAsync(outputName=None)

298

```

299

Run asynchronous forward pass. Useful for pipelining and concurrent processing.

300

301

**Parameters:**

302

- `outputName` (str, optional): Name of the output layer

303

304

**Returns:** Async handle for retrieving results

305

306

---

307

308

```python { .api }

309

Net.setPreferableBackend(backendId)

310

```

311

Set the computation backend for the network. Different backends offer different performance characteristics.

312

313

**Parameters:**

314

- `backendId` (int): Backend identifier (see Backend Constants section)

315

316

**Returns:** None

317

318

**Example:**

319

```python

320

# Use OpenCV's implementation (CPU)

321

net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)

322

323

# Use CUDA for GPU acceleration

324

net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)

325

326

# Use Intel's OpenVINO

327

net.setPreferableBackend(cv2.dnn.DNN_BACKEND_INFERENCE_ENGINE)

328

```

329

330

---

331

332

```python { .api }

333

Net.setPreferableTarget(targetId)

334

```

335

Set the computation target device (CPU, GPU, etc.). Must be called after `setPreferableBackend()`.

336

337

**Parameters:**

338

- `targetId` (int): Target device identifier (see Target Constants section)

339

340

**Returns:** None

341

342

**Example:**

343

```python

344

# Use CPU

345

net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)

346

347

# Use GPU with CUDA

348

net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)

349

net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)

350

351

# Use GPU with FP16 precision for faster inference

352

net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)

353

net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA_FP16)

354

```

355

356

---

357

358

```python { .api }

359

Net.getLayerNames()

360

```

361

Get names of all layers in the network. Useful for debugging and understanding network structure.

362

363

**Returns:** List of strings containing layer names

364

365

**Example:**

366

```python

367

layer_names = net.getLayerNames()

368

print(f"Network has {len(layer_names)} layers")

369

for name in layer_names[:5]:

370

print(name)

371

```

372

373

---

374

375

```python { .api }

376

Net.getUnconnectedOutLayers()

377

```

378

Get indices of output layers (layers without consumers). These are typically the final layers you want to retrieve.

379

380

**Returns:** List of integers representing output layer indices

381

382

**Example:**

383

```python

384

output_layers = net.getUnconnectedOutLayers()

385

print(f"Output layer indices: {output_layers}")

386

```

387

388

---

389

390

```python { .api }

391

Net.getUnconnectedOutLayersNames()

392

```

393

Get names of output layers. More convenient than using indices.

394

395

**Returns:** List of strings containing output layer names

396

397

**Example:**

398

```python

399

# Get outputs from all output layers (e.g., for YOLO)

400

output_layer_names = net.getUnconnectedOutLayersNames()

401

outputs = net.forward(output_layer_names)

402

```

403

404

### Post-processing

405

406

After running inference, post-processing is often needed to filter and refine detection results. Non-Maximum Suppression (NMS) is the most common post-processing technique.

407

408

```python { .api }

409

cv2.dnn.NMSBoxes(bboxes, scores, score_threshold, nms_threshold,

410

eta=1.0, top_k=0)

411

```

412

Apply Non-Maximum Suppression (NMS) to bounding boxes. NMS filters out overlapping detections, keeping only the most confident ones.

413

414

**Parameters:**

415

- `bboxes` (list): List of bounding boxes, each as [x, y, width, height]

416

- `scores` (list): List of confidence scores corresponding to each box

417

- `score_threshold` (float): Minimum score threshold to consider a detection

418

- `nms_threshold` (float): IoU (Intersection over Union) threshold for NMS (typically 0.3-0.5)

419

- `eta` (float): Coefficient for adaptive NMS (default: 1.0)

420

- `top_k` (int): Maximum number of boxes to keep (0 = no limit)

421

422

**Returns:** List of indices of boxes to keep after NMS

423

424

**Example:**

425

```python

426

# Apply NMS to detections

427

boxes = [[10, 10, 50, 50], [12, 12, 48, 48], [100, 100, 60, 60]]

428

scores = [0.9, 0.85, 0.95]

429

430

indices = cv2.dnn.NMSBoxes(

431

boxes,

432

scores,

433

score_threshold=0.5,

434

nms_threshold=0.4

435

)

436

437

# Keep only selected boxes

438

kept_boxes = [boxes[i] for i in indices]

439

kept_scores = [scores[i] for i in indices]

440

```

441

442

---

443

444

```python { .api }

445

cv2.dnn.NMSBoxesRotated(bboxes, scores, score_threshold, nms_threshold,

446

eta=1.0, top_k=0)

447

```

448

Apply NMS to rotated bounding boxes. Used for oriented object detection where boxes can be at any angle.

449

450

**Parameters:**

451

- `bboxes` (list): List of rotated boxes, each as ((center_x, center_y), (width, height), angle)

452

- Other parameters same as `NMSBoxes()`

453

454

**Returns:** List of indices of boxes to keep

455

456

**Example:**

457

```python

458

# Rotated boxes for text detection

459

rotated_boxes = [

460

((100, 100), (50, 20), 30.0), # center, size, angle

461

((150, 150), (60, 25), -15.0)

462

]

463

scores = [0.9, 0.85]

464

465

indices = cv2.dnn.NMSBoxesRotated(

466

rotated_boxes,

467

scores,

468

score_threshold=0.5,

469

nms_threshold=0.3

470

)

471

```

472

473

### Backend and Target Constants

474

475

Backend constants specify which computational backend to use:

476

477

```python { .api }

478

# Backend constants

479

cv2.dnn.DNN_BACKEND_DEFAULT # Let OpenCV choose

480

cv2.dnn.DNN_BACKEND_HALIDE # Halide backend

481

cv2.dnn.DNN_BACKEND_INFERENCE_ENGINE # Intel OpenVINO

482

cv2.dnn.DNN_BACKEND_OPENCV # Pure OpenCV implementation

483

cv2.dnn.DNN_BACKEND_VKCOM # Vulkan compute

484

cv2.dnn.DNN_BACKEND_CUDA # NVIDIA CUDA

485

```

486

487

Target constants specify which device to run on:

488

489

```python { .api }

490

# Target constants

491

cv2.dnn.DNN_TARGET_CPU # CPU execution

492

cv2.dnn.DNN_TARGET_OPENCL # OpenCL (GPU)

493

cv2.dnn.DNN_TARGET_OPENCL_FP16 # OpenCL with FP16 precision

494

cv2.dnn.DNN_TARGET_MYRIAD # Intel Movidius

495

cv2.dnn.DNN_TARGET_VULKAN # Vulkan API

496

cv2.dnn.DNN_TARGET_CUDA # NVIDIA CUDA GPU

497

cv2.dnn.DNN_TARGET_CUDA_FP16 # NVIDIA CUDA with FP16

498

```

499

500

**Usage example:**

501

```python

502

# Configure for optimal CPU performance

503

net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)

504

net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)

505

506

# Configure for NVIDIA GPU with FP16

507

net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)

508

net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA_FP16)

509

510

# Configure for Intel hardware with OpenVINO

511

net.setPreferableBackend(cv2.dnn.DNN_BACKEND_INFERENCE_ENGINE)

512

net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)

513

```

514

515

## Practical Examples

516

517

### Example 1: Image Classification

518

519

```python

520

import cv2

521

import numpy as np

522

523

# Load a pre-trained classification model (e.g., MobileNet)

524

net = cv2.dnn.readNetFromCaffe('mobilenet_deploy.prototxt',

525

'mobilenet.caffemodel')

526

527

# Read and preprocess image

528

image = cv2.imread('image.jpg')

529

blob = cv2.dnn.blobFromImage(image,

530

scalefactor=1.0,

531

size=(224, 224),

532

mean=(104.0, 117.0, 123.0),

533

swapRB=False,

534

crop=False)

535

536

# Run inference

537

net.setInput(blob)

538

predictions = net.forward()

539

540

# Get top prediction

541

class_id = np.argmax(predictions[0])

542

confidence = predictions[0][class_id]

543

544

print(f"Predicted class: {class_id}, Confidence: {confidence:.2f}")

545

```

546

547

### Example 2: Object Detection with YOLO

548

549

```python

550

import cv2

551

import numpy as np

552

553

# Load YOLO model

554

net = cv2.dnn.readNetFromDarknet('yolov4.cfg', 'yolov4.weights')

555

net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)

556

net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)

557

558

# Read image

559

image = cv2.imread('image.jpg')

560

height, width = image.shape[:2]

561

562

# Preprocess

563

blob = cv2.dnn.blobFromImage(image,

564

scalefactor=1/255.0,

565

size=(416, 416),

566

swapRB=True,

567

crop=False)

568

569

# Run inference

570

net.setInput(blob)

571

output_layers = net.getUnconnectedOutLayersNames()

572

outputs = net.forward(output_layers)

573

574

# Post-process detections

575

boxes = []

576

confidences = []

577

class_ids = []

578

579

for output in outputs:

580

for detection in output:

581

scores = detection[5:]

582

class_id = np.argmax(scores)

583

confidence = scores[class_id]

584

585

if confidence > 0.5:

586

# Scale bounding box back to original image

587

center_x = int(detection[0] * width)

588

center_y = int(detection[1] * height)

589

w = int(detection[2] * width)

590

h = int(detection[3] * height)

591

592

# Rectangle coordinates

593

x = int(center_x - w / 2)

594

y = int(center_y - h / 2)

595

596

boxes.append([x, y, w, h])

597

confidences.append(float(confidence))

598

class_ids.append(class_id)

599

600

# Apply NMS

601

indices = cv2.dnn.NMSBoxes(boxes, confidences,

602

score_threshold=0.5,

603

nms_threshold=0.4)

604

605

# Draw results

606

for i in indices:

607

box = boxes[i]

608

x, y, w, h = box

609

cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)

610

cv2.putText(image, f'Class {class_ids[i]}: {confidences[i]:.2f}',

611

(x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

612

613

cv2.imshow('Detections', image)

614

cv2.waitKey(0)

615

```

616

617

### Example 3: Face Detection with SSD

618

619

```python

620

import cv2

621

622

# Load pre-trained face detection model

623

net = cv2.dnn.readNetFromCaffe(

624

'deploy.prototxt',

625

'res10_300x300_ssd_iter_140000.caffemodel'

626

)

627

628

# Read image

629

image = cv2.imread('faces.jpg')

630

h, w = image.shape[:2]

631

632

# Preprocess

633

blob = cv2.dnn.blobFromImage(

634

cv2.resize(image, (300, 300)),

635

scalefactor=1.0,

636

size=(300, 300),

637

mean=(104.0, 177.0, 123.0)

638

)

639

640

# Detect faces

641

net.setInput(blob)

642

detections = net.forward()

643

644

# Process detections

645

for i in range(detections.shape[2]):

646

confidence = detections[0, 0, i, 2]

647

648

if confidence > 0.5:

649

# Get bounding box

650

box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])

651

(x1, y1, x2, y2) = box.astype(int)

652

653

# Draw rectangle

654

cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)

655

text = f'{confidence * 100:.1f}%'

656

cv2.putText(image, text, (x1, y1 - 10),

657

cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

658

659

cv2.imshow('Face Detection', image)

660

cv2.waitKey(0)

661

```

662

663

### Example 4: Using ONNX Models

664

665

```python

666

import cv2

667

import numpy as np

668

669

# Load ONNX model (e.g., exported from PyTorch)

670

net = cv2.dnn.readNetFromONNX('model.onnx')

671

672

# Optional: Use GPU

673

net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)

674

net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)

675

676

# Read and preprocess image

677

image = cv2.imread('image.jpg')

678

blob = cv2.dnn.blobFromImage(image,

679

scalefactor=1/255.0,

680

size=(640, 640),

681

mean=(0, 0, 0),

682

swapRB=True,

683

crop=False)

684

685

# Run inference

686

net.setInput(blob)

687

output = net.forward()

688

689

print(f"Output shape: {output.shape}")

690

# Further processing depends on model architecture

691

```

692

693

### Example 5: Batch Processing

694

695

```python

696

import cv2

697

import numpy as np

698

699

# Load model

700

net = cv2.dnn.readNetFromCaffe('model.prototxt', 'model.caffemodel')

701

702

# Load multiple images

703

images = [cv2.imread(f'image{i}.jpg') for i in range(5)]

704

705

# Create batch blob

706

blob = cv2.dnn.blobFromImages(images,

707

scalefactor=1/255.0,

708

size=(224, 224),

709

mean=(0, 0, 0),

710

swapRB=True)

711

712

# Process batch

713

net.setInput(blob)

714

predictions = net.forward()

715

716

# Results for each image

717

for i, pred in enumerate(predictions):

718

class_id = np.argmax(pred)

719

confidence = pred[class_id]

720

print(f"Image {i}: Class {class_id}, Confidence: {confidence:.2f}")

721

```

722