0
# Object Detection
1
2
OpenCV provides robust tools for detecting and recognizing objects in images and videos. The object detection module includes traditional computer vision methods like Haar cascades and HOG descriptors, as well as modern deep learning-based detectors. These capabilities enable applications such as face detection, pedestrian detection, QR code scanning, and custom object recognition.
3
4
## Capabilities
5
6
### Cascade Classifiers
7
8
Cascade classifiers use a machine learning approach based on Haar features or LBP features to detect objects. They are fast and efficient for real-time detection tasks.
9
10
**CascadeClassifier Class**
11
12
```python { .api }
13
cv2.CascadeClassifier(filename=None)
14
```
15
16
Creates a cascade classifier object for object detection. If `filename` is provided, loads the cascade from the specified XML file.
17
18
**Loading a Cascade**
19
20
```python { .api }
21
classifier.load(filename)
22
```
23
24
Loads a cascade classifier from an XML file. Returns `True` if successful, `False` otherwise.
25
26
**Detecting Objects**
27
28
```python { .api }
29
objects = classifier.detectMultiScale(
30
image,
31
scaleFactor=1.1,
32
minNeighbors=3,
33
flags=0,
34
minSize=(0, 0),
35
maxSize=(0, 0)
36
)
37
```
38
39
Detects objects of different sizes in the input image. Returns a list of rectangles where objects were found, as `(x, y, width, height)` tuples.
40
41
- `image`: Input image (grayscale recommended for better performance)
42
- `scaleFactor`: Parameter specifying how much the image size is reduced at each scale (e.g., 1.1 means 10% reduction)
43
- `minNeighbors`: Specifies how many neighbors each candidate rectangle should have to retain it (higher value results in fewer detections but higher quality)
44
- `flags`: Legacy parameter from old API, typically set to 0
45
- `minSize`: Minimum object size in pixels
46
- `maxSize`: Maximum object size in pixels (0 means no limit)
47
48
**Detecting with Level Information**
49
50
```python { .api }
51
objects, numDetections = classifier.detectMultiScale2(
52
image,
53
scaleFactor=1.1,
54
minNeighbors=3,
55
flags=0,
56
minSize=(0, 0),
57
maxSize=(0, 0)
58
)
59
```
60
61
Similar to `detectMultiScale()`, but also returns the number of neighbor rectangles for each detection, which can be used as a confidence measure.
62
63
**Detecting with Weights**
64
65
```python { .api }
66
objects, rejectLevels, levelWeights = classifier.detectMultiScale3(
67
image,
68
scaleFactor=1.1,
69
minNeighbors=3,
70
flags=0,
71
minSize=(0, 0),
72
maxSize=(0, 0),
73
outputRejectLevels=True
74
)
75
```
76
77
Extended detection that returns reject levels and level weights for each detection, providing more detailed information about detection confidence.
78
79
**Example: Face Detection**
80
81
```python { .api }
82
import cv2
83
84
# Load the cascade classifier
85
face_cascade = cv2.CascadeClassifier(
86
cv2.data.haarcascades + 'haarcascade_frontalface_default.xml'
87
)
88
89
# Load image
90
img = cv2.imread('image.jpg')
91
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
92
93
# Detect faces
94
faces = face_cascade.detectMultiScale(
95
gray,
96
scaleFactor=1.1,
97
minNeighbors=5,
98
minSize=(30, 30)
99
)
100
101
# Draw rectangles around faces
102
for (x, y, w, h) in faces:
103
cv2.rectangle(img, (x, y), (x+w, y+h), (255, 0, 0), 2)
104
```
105
106
### HOG Descriptor
107
108
Histogram of Oriented Gradients (HOG) is a feature descriptor used for object detection, particularly effective for pedestrian detection.
109
110
**HOGDescriptor Class**
111
112
```python
113
hog = cv2.HOGDescriptor()
114
```
115
116
Creates a HOG descriptor and detector object with default parameters.
117
118
**Computing HOG Descriptors**
119
120
```python { .api }
121
descriptors = hog.compute(
122
img,
123
winStride=(8, 8),
124
padding=(0, 0),
125
locations=None
126
)
127
```
128
129
Computes HOG descriptors for the image.
130
131
- `img`: Input image
132
- `winStride`: Step size for sliding window (in pixels)
133
- `padding`: Padding around the image
134
- `locations`: Optional list of detection locations
135
136
Returns a numpy array of HOG descriptors.
137
138
**Setting the SVM Detector**
139
140
```python { .api }
141
hog.setSVMDetector(detector)
142
```
143
144
Sets the SVM (Support Vector Machine) detector coefficients for object detection. OpenCV provides pre-trained detectors.
145
146
**Getting Default People Detector**
147
148
```python { .api }
149
detector = cv2.HOGDescriptor_getDefaultPeopleDetector()
150
```
151
152
Returns the default people/pedestrian detector coefficients trained on the INRIA person dataset.
153
154
**Detecting Objects with HOG**
155
156
```python { .api }
157
found, weights = hog.detectMultiScale(
158
img,
159
hitThreshold=0,
160
winStride=(8, 8),
161
padding=(0, 0),
162
scale=1.05,
163
finalThreshold=2.0,
164
useMeanshiftGrouping=False
165
)
166
```
167
168
Detects objects (e.g., people) in the image using the HOG descriptor and SVM classifier.
169
170
- `img`: Input image
171
- `hitThreshold`: Threshold for detection decision (lower values increase detections but also false positives)
172
- `winStride`: Step size for sliding window
173
- `padding`: Padding around the image
174
- `scale`: Scale factor for image pyramid
175
- `finalThreshold`: Threshold for the final detection grouping
176
- `useMeanshiftGrouping`: Use mean-shift grouping instead of NMS
177
178
Returns detected object rectangles and their weights.
179
180
**Example: Pedestrian Detection**
181
182
```python { .api }
183
import cv2
184
185
# Initialize HOG descriptor with default people detector
186
hog = cv2.HOGDescriptor()
187
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())
188
189
# Load image
190
img = cv2.imread('street.jpg')
191
192
# Detect people
193
found, weights = hog.detectMultiScale(
194
img,
195
winStride=(8, 8),
196
padding=(4, 4),
197
scale=1.05
198
)
199
200
# Draw rectangles around detected people
201
for (x, y, w, h) in found:
202
cv2.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), 2)
203
```
204
205
### QR Code Detection
206
207
OpenCV provides dedicated tools for detecting and decoding QR codes in images.
208
209
**QRCodeDetector Class**
210
211
```python
212
detector = cv2.QRCodeDetector()
213
```
214
215
Creates a QR code detector object.
216
217
**Detecting QR Codes**
218
219
```python { .api }
220
retval, points = detector.detect(img)
221
```
222
223
Detects a QR code in the image. Returns `True` if found, and the corner points of the QR code.
224
225
**Decoding QR Codes**
226
227
```python { .api }
228
data, points, straight_qrcode = detector.decode(img, points)
229
```
230
231
Decodes a QR code given its corner points. Returns the decoded string, corner points, and the rectified QR code image.
232
233
**Detecting and Decoding in One Call**
234
235
```python { .api }
236
data, points, straight_qrcode = detector.detectAndDecode(img)
237
```
238
239
Detects and decodes a QR code in a single operation. Returns:
240
- `data`: Decoded string from QR code (empty if no QR code found)
241
- `points`: Corner points of the QR code
242
- `straight_qrcode`: Rectified QR code image
243
244
**Detecting Multiple QR Codes**
245
246
```python { .api }
247
retval, points = detector.detectMulti(img)
248
```
249
250
Detects multiple QR codes in the image. Returns `True` if any QR codes are found, and a list of corner points for each detected QR code.
251
252
**Decoding Multiple QR Codes**
253
254
```python { .api }
255
retval, decoded_info, points, straight_qrcodes = detector.decodeMulti(img, points)
256
```
257
258
Decodes multiple QR codes given their corner points. Returns:
259
- `retval`: `True` if successful
260
- `decoded_info`: List of decoded strings
261
- `points`: List of corner points for each QR code
262
- `straight_qrcodes`: List of rectified QR code images
263
264
**Example: QR Code Detection and Decoding**
265
266
```python { .api }
267
import cv2
268
269
# Create QR code detector
270
detector = cv2.QRCodeDetector()
271
272
# Load image
273
img = cv2.imread('qrcode.jpg')
274
275
# Detect and decode
276
data, points, straight_qrcode = detector.detectAndDecode(img)
277
278
if data:
279
print(f"QR Code detected: {data}")
280
281
# Draw boundary around QR code
282
if points is not None:
283
points = points.reshape(-1, 2).astype(int)
284
for i in range(4):
285
cv2.line(img, tuple(points[i]), tuple(points[(i+1)%4]), (0, 255, 0), 3)
286
else:
287
print("No QR Code detected")
288
```
289
290
### QRCodeDetectorAruco
291
292
```python
293
detector = cv2.QRCodeDetectorAruco()
294
```
295
296
An enhanced QR code detector that uses ArUco markers for improved detection. Provides the same interface as `QRCodeDetector` but with better robustness in challenging conditions.
297
298
### Face Detection (DNN-based)
299
300
Modern face detection using deep neural networks provides more accurate results than traditional cascade classifiers.
301
302
**FaceDetectorYN Class**
303
304
```python { .api }
305
detector = cv2.FaceDetectorYN.create(
306
model,
307
config,
308
input_size,
309
score_threshold=0.9,
310
nms_threshold=0.3,
311
top_k=5000,
312
backend_id=0,
313
target_id=0
314
)
315
```
316
317
Creates a YuNet face detector. YuNet is a lightweight and accurate face detection model.
318
319
- `model`: Path to the ONNX model file
320
- `config`: Path to the config file (can be empty string)
321
- `input_size`: Input size for the neural network as (width, height)
322
- `score_threshold`: Confidence threshold for face detection
323
- `nms_threshold`: Non-maximum suppression threshold
324
- `top_k`: Keep top K detections before NMS
325
- `backend_id`: Backend identifier (e.g., default, OpenCV, CUDA)
326
- `target_id`: Target device identifier (e.g., CPU, GPU)
327
328
**Detecting Faces**
329
330
```python { .api }
331
faces = detector.detect(img)
332
```
333
334
Detects faces in the input image. Returns a tuple containing:
335
- Return value (1 if faces detected, 0 otherwise)
336
- Face detections as numpy array where each row contains: [x, y, w, h, x_re, y_re, x_le, y_le, x_nt, y_nt, x_rcm, y_rcm, x_lcm, y_lcm, confidence]
337
- First 4 values: Bounding box (x, y, width, height)
338
- Next values: Facial landmarks (right eye, left eye, nose tip, right corner of mouth, left corner of mouth)
339
- Last value: Detection confidence score
340
341
### Face Recognition
342
343
**FaceRecognizerSF Class**
344
345
```python { .api }
346
recognizer = cv2.FaceRecognizerSF.create(
347
model,
348
config,
349
backend_id=0,
350
target_id=0
351
)
352
```
353
354
Creates a face recognition model based on SFace. Used to extract face features and compare faces for recognition tasks.
355
356
**Extracting Face Features**
357
358
```python { .api }
359
feature = recognizer.feature(aligned_face)
360
```
361
362
Extracts a feature vector from an aligned face image. The feature vector can be used for face comparison and recognition.
363
364
**Comparing Faces**
365
366
```python { .api }
367
score = recognizer.match(
368
face_feature1,
369
face_feature2,
370
dis_type=cv2.FaceRecognizerSF_FR_COSINE
371
)
372
```
373
374
Computes the similarity score between two face features. Higher scores indicate more similar faces.
375
376
Distance types:
377
- `cv2.FaceRecognizerSF_FR_COSINE`: Cosine distance
378
- `cv2.FaceRecognizerSF_FR_NORM_L2`: L2 norm distance
379
380
## Haar Cascade Data Files
381
382
OpenCV includes pre-trained Haar cascade classifiers for various object detection tasks. These XML files are distributed with the opencv-python package and can be accessed via the `cv2.data.haarcascades` path.
383
384
**Accessing Haar Cascade Files**
385
386
```python { .api }
387
import cv2
388
389
# Get the path to the haarcascades directory
390
cascade_path = cv2.data.haarcascades
391
392
# Load a specific cascade
393
face_cascade = cv2.CascadeClassifier(cascade_path + 'haarcascade_frontalface_default.xml')
394
```
395
396
**Available Cascade Files**
397
398
OpenCV provides the following pre-trained Haar cascade classifiers:
399
400
**Face Detection:**
401
- `haarcascade_frontalface_default.xml` - Default frontal face detector (most commonly used)
402
- `haarcascade_frontalface_alt.xml` - Alternative frontal face detector
403
- `haarcascade_frontalface_alt2.xml` - Another alternative frontal face detector
404
- `haarcascade_frontalface_alt_tree.xml` - Tree-based frontal face detector
405
- `haarcascade_profileface.xml` - Profile (side view) face detector
406
407
**Eye Detection:**
408
- `haarcascade_eye.xml` - Generic eye detector
409
- `haarcascade_eye_tree_eyeglasses.xml` - Eye detector that works with eyeglasses
410
- `haarcascade_lefteye_2splits.xml` - Left eye detector
411
- `haarcascade_righteye_2splits.xml` - Right eye detector
412
413
**Facial Features:**
414
- `haarcascade_smile.xml` - Smile detector
415
416
**Body Detection:**
417
- `haarcascade_fullbody.xml` - Full body detector
418
- `haarcascade_upperbody.xml` - Upper body detector
419
- `haarcascade_lowerbody.xml` - Lower body detector
420
421
**Animal Detection:**
422
- `haarcascade_frontalcatface.xml` - Cat face detector
423
- `haarcascade_frontalcatface_extended.xml` - Extended cat face detector
424
425
**Other Objects:**
426
- `haarcascade_licence_plate_rus_16stages.xml` - Russian license plate detector
427
428
**Example: Loading Multiple Cascades**
429
430
```python
431
import cv2
432
433
cascade_path = cv2.data.haarcascades
434
435
# Load face and eye cascades
436
face_cascade = cv2.CascadeClassifier(
437
cascade_path + 'haarcascade_frontalface_default.xml'
438
)
439
eye_cascade = cv2.CascadeClassifier(
440
cascade_path + 'haarcascade_eye.xml'
441
)
442
smile_cascade = cv2.CascadeClassifier(
443
cascade_path + 'haarcascade_smile.xml'
444
)
445
446
# Load image and convert to grayscale
447
img = cv2.imread('people.jpg')
448
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
449
450
# Detect faces
451
faces = face_cascade.detectMultiScale(gray, 1.3, 5)
452
453
# For each face, detect eyes and smile
454
for (x, y, w, h) in faces:
455
cv2.rectangle(img, (x, y), (x+w, y+h), (255, 0, 0), 2)
456
roi_gray = gray[y:y+h, x:x+w]
457
roi_color = img[y:y+h, x:x+w]
458
459
# Detect eyes in face region
460
eyes = eye_cascade.detectMultiScale(roi_gray)
461
for (ex, ey, ew, eh) in eyes:
462
cv2.rectangle(roi_color, (ex, ey), (ex+ew, ey+eh), (0, 255, 0), 2)
463
464
# Detect smile in face region
465
smiles = smile_cascade.detectMultiScale(roi_gray, 1.8, 20)
466
for (sx, sy, sw, sh) in smiles:
467
cv2.rectangle(roi_color, (sx, sy), (sx+sw, sy+sh), (0, 0, 255), 2)
468
```
469
470
**Performance Tips for Cascade Classifiers:**
471
472
1. **Convert to Grayscale**: Cascade classifiers work faster on grayscale images
473
2. **Adjust scaleFactor**: Smaller values (e.g., 1.05) are more thorough but slower; larger values (e.g., 1.3) are faster but may miss objects
474
3. **Tune minNeighbors**: Higher values reduce false positives but may miss some objects
475
4. **Set Size Limits**: Use `minSize` and `maxSize` to restrict detection to expected object sizes
476
5. **Process at Lower Resolution**: Resize large images before detection for better performance
477
6. **Region of Interest**: If possible, detect only in specific regions of the image
478
479
**Cascade Classifier Limitations:**
480
481
- Haar cascades are sensitive to object orientation and scale
482
- Performance decreases with variations in lighting, pose, and occlusion
483
- For more robust detection, consider using DNN-based detectors (see cv2.dnn module)
484
- Profile face detection is generally less accurate than frontal face detection
485
- Eye detection works best on frontal faces with open eyes
486