0
# Screen Capture and Image Recognition
1
2
Screenshot capture and computer vision capabilities for finding images, text, and UI elements on screen with pixel-perfect matching and tolerance controls. Provides comprehensive screen analysis tools for automated testing and GUI interaction.
3
4
## Capabilities
5
6
### Screen Information
7
8
Get basic screen dimensions and mouse position information.
9
10
```python { .api }
11
def size():
12
"""
13
Get screen size as (width, height) tuple.
14
15
Returns:
16
Tuple[int, int]: Screen dimensions in pixels (width, height)
17
"""
18
19
def resolution():
20
"""Alias for size() - get screen resolution."""
21
22
def position():
23
"""
24
Get current mouse position.
25
26
Returns:
27
Tuple[int, int]: Current mouse coordinates (x, y)
28
"""
29
30
def onScreen(x, y=None):
31
"""
32
Check if coordinates are within screen bounds.
33
34
Parameters:
35
- x (int or tuple): X coordinate, or (x, y) tuple
36
- y (int, optional): Y coordinate if x is not tuple
37
38
Returns:
39
bool: True if coordinates are on screen, False otherwise
40
"""
41
```
42
43
### Screenshot Capture
44
45
Capture screenshots of the entire screen or specific regions with optional file saving.
46
47
```python { .api }
48
def screenshot(imageFilename=None, region=None):
49
"""
50
Capture screenshot of screen or region.
51
52
Parameters:
53
- imageFilename (str, optional): Path to save screenshot. If None, returns PIL Image
54
- region (tuple, optional): (left, top, width, height) region to capture. If None, captures full screen
55
56
Returns:
57
PIL.Image: Screenshot image object (if imageFilename is None)
58
str: Path to saved image file (if imageFilename provided)
59
60
Examples:
61
screenshot('fullscreen.png') # Save full screen
62
screenshot('region.png', (100, 100, 300, 200)) # Save specific region
63
img = screenshot() # Return PIL Image object
64
"""
65
```
66
67
### Image Location - Single Match
68
69
Find single instances of images on screen with configurable matching parameters.
70
71
```python { .api }
72
def locateOnScreen(image, **kwargs):
73
"""
74
Find image on screen and return its location.
75
76
Parameters:
77
- image (str or PIL.Image): Path to template image or PIL Image object
78
- region (tuple, optional): (left, top, width, height) search region
79
- confidence (float, optional): Match confidence 0.0-1.0 (requires OpenCV)
80
- grayscale (bool, optional): Convert to grayscale for faster matching (default: False)
81
82
Returns:
83
Box: Named tuple with (left, top, width, height) or None if not found
84
85
Raises:
86
ImageNotFoundException: If image not found and useImageNotFoundException() is True
87
"""
88
89
def locateCenterOnScreen(image, **kwargs):
90
"""
91
Find image on screen and return center coordinates.
92
93
Parameters:
94
- image (str or PIL.Image): Path to template image or PIL Image object
95
- Same parameters as locateOnScreen()
96
97
Returns:
98
Point: Named tuple with (x, y) center coordinates or None if not found
99
"""
100
101
def locate(needleImage, haystackImage, **kwargs):
102
"""
103
Find needleImage within haystackImage.
104
105
Parameters:
106
- needleImage (str or PIL.Image): Template image to find
107
- haystackImage (str or PIL.Image): Image to search within
108
- region (tuple, optional): Search region within haystack
109
- confidence (float, optional): Match confidence 0.0-1.0
110
- grayscale (bool, optional): Use grayscale matching
111
112
Returns:
113
Box: Location of needle in haystack or None if not found
114
"""
115
```
116
117
### Image Location - Multiple Matches
118
119
Find all instances of images on screen or within other images.
120
121
```python { .api }
122
def locateAllOnScreen(image, **kwargs):
123
"""
124
Find all instances of image on screen.
125
126
Parameters:
127
- image (str or PIL.Image): Path to template image or PIL Image object
128
- region (tuple, optional): (left, top, width, height) search region
129
- confidence (float, optional): Match confidence 0.0-1.0
130
- grayscale (bool, optional): Use grayscale matching
131
132
Returns:
133
Generator[Box]: Generator yielding Box objects for each match
134
135
Example:
136
for match in pyautogui.locateAllOnScreen('button.png'):
137
print(f"Found button at {match}")
138
"""
139
140
def locateAll(needleImage, haystackImage, **kwargs):
141
"""
142
Find all instances of needleImage within haystackImage.
143
144
Parameters:
145
- needleImage (str or PIL.Image): Template image to find
146
- haystackImage (str or PIL.Image): Image to search within
147
- Same optional parameters as locateAllOnScreen()
148
149
Returns:
150
Generator[Box]: Generator yielding Box objects for each match
151
"""
152
```
153
154
### Window-Specific Image Location
155
156
Find images within specific application windows (Windows platform only).
157
158
```python { .api }
159
def locateOnWindow(image, window, **kwargs):
160
"""
161
Find image within a specific window (Windows only).
162
163
Parameters:
164
- image (str or PIL.Image): Template image to find
165
- window (Window): Window object to search within
166
- Same optional parameters as locateOnScreen()
167
168
Returns:
169
Box: Location relative to window or None if not found
170
171
Note: Requires PyGetWindow. Windows platform only.
172
"""
173
```
174
175
### Pixel Analysis
176
177
Analyze individual pixels and colors on screen with tolerance matching.
178
179
```python { .api }
180
def pixel(x, y):
181
"""
182
Get RGB color of pixel at screen coordinates.
183
184
Parameters:
185
- x, y (int): Screen coordinates
186
187
Returns:
188
Tuple[int, int, int]: RGB color values (red, green, blue) 0-255
189
"""
190
191
def pixelMatchesColor(x, y, expectedRGBColor, tolerance=0):
192
"""
193
Check if pixel color matches expected color within tolerance.
194
195
Parameters:
196
- x, y (int): Screen coordinates
197
- expectedRGBColor (tuple): Expected RGB color (red, green, blue)
198
- tolerance (int): Color tolerance 0-255 (default: 0 for exact match)
199
200
Returns:
201
bool: True if pixel matches color within tolerance
202
203
Example:
204
# Check if pixel is red (within tolerance of 10)
205
is_red = pyautogui.pixelMatchesColor(100, 200, (255, 0, 0), tolerance=10)
206
"""
207
```
208
209
### Utility Functions
210
211
Helper functions for working with image locations and regions.
212
213
```python { .api }
214
def center(region):
215
"""
216
Get center point of a region.
217
218
Parameters:
219
- region (Box or tuple): Region with (left, top, width, height)
220
221
Returns:
222
Point: Center coordinates (x, y)
223
"""
224
```
225
226
### Image Recognition Configuration
227
228
Configure behavior of image recognition functions.
229
230
```python { .api }
231
def useImageNotFoundException(value=None):
232
"""
233
Configure whether image location functions raise exceptions.
234
235
Parameters:
236
- value (bool, optional): True to raise exceptions, False to return None.
237
If None, returns current setting.
238
239
Returns:
240
bool: Current setting (if value is None)
241
None: (if value is provided)
242
243
When True: locateOnScreen() raises ImageNotFoundException if image not found
244
When False: locateOnScreen() returns None if image not found
245
"""
246
```
247
248
## Image Formats and Requirements
249
250
### Supported Image Formats
251
- PNG (recommended for UI elements)
252
- JPEG/JPG (for photographs)
253
- BMP (Windows bitmap)
254
- GIF (static images only)
255
- TIFF (high quality images)
256
257
### Template Matching Tips
258
- Use PNG format for crisp UI elements
259
- Template images should be pixel-perfect matches
260
- Consider using confidence parameter for slight variations
261
- Grayscale matching is faster but less precise
262
- Screenshot template images directly from target application
263
264
## Usage Examples
265
266
```python
267
import pyautogui
268
269
# Get screen information
270
width, height = pyautogui.size()
271
print(f"Screen size: {width}x{height}")
272
273
current_pos = pyautogui.position()
274
print(f"Mouse position: {current_pos}")
275
276
# Take screenshots
277
screenshot = pyautogui.screenshot() # Full screen PIL Image
278
pyautogui.screenshot('desktop.png') # Save full screen
279
pyautogui.screenshot('region.png', region=(0, 0, 300, 400)) # Save region
280
281
# Find images on screen
282
button_location = pyautogui.locateOnScreen('submit_button.png')
283
if button_location:
284
# Click the center of the found button
285
center_point = pyautogui.center(button_location)
286
pyautogui.click(center_point)
287
else:
288
print("Button not found")
289
290
# Find image with confidence (requires OpenCV)
291
try:
292
location = pyautogui.locateOnScreen('logo.png', confidence=0.8)
293
pyautogui.click(location)
294
except pyautogui.ImageNotFoundException:
295
print("Logo not found with 80% confidence")
296
297
# Find all instances of an image
298
for button in pyautogui.locateAllOnScreen('close_button.png'):
299
print(f"Close button found at: {button}")
300
# Click each close button found
301
pyautogui.click(pyautogui.center(button))
302
303
# Pixel color analysis
304
pixel_color = pyautogui.pixel(100, 200)
305
print(f"Pixel color at (100, 200): RGB{pixel_color}")
306
307
# Check if pixel matches expected color
308
is_white = pyautogui.pixelMatchesColor(100, 200, (255, 255, 255), tolerance=5)
309
if is_white:
310
print("Pixel is approximately white")
311
312
# Configure exception behavior
313
pyautogui.useImageNotFoundException(True) # Raise exceptions
314
try:
315
location = pyautogui.locateOnScreen('nonexistent.png')
316
except pyautogui.ImageNotFoundException:
317
print("Image not found - exception raised")
318
319
# Complex image recognition workflow
320
def find_and_click_button(button_image, timeout=10):
321
"""Find and click a button with timeout"""
322
import time
323
start_time = time.time()
324
325
while time.time() - start_time < timeout:
326
try:
327
button_pos = pyautogui.locateOnScreen(button_image, confidence=0.7)
328
if button_pos:
329
pyautogui.click(pyautogui.center(button_pos))
330
return True
331
except pyautogui.ImageNotFoundException:
332
pass
333
time.sleep(0.5)
334
335
return False # Button not found within timeout
336
337
# Use the function
338
if find_and_click_button('login_button.png'):
339
print("Login button clicked successfully")
340
else:
341
print("Login button not found within timeout")
342
```
343
344
## Data Types
345
346
```python { .api }
347
from collections import namedtuple
348
from typing import Tuple, Generator, Union, Optional
349
import PIL.Image
350
351
# Region and position types
352
Box = namedtuple('Box', ['left', 'top', 'width', 'height'])
353
Point = namedtuple('Point', ['x', 'y'])
354
355
# Color type
356
Color = Tuple[int, int, int] # RGB values 0-255
357
358
# Region specification (for screenshot and search areas)
359
Region = Tuple[int, int, int, int] # (left, top, width, height)
360
361
# Image input types
362
ImageInput = Union[str, PIL.Image.Image] # File path or PIL Image object
363
```
364
365
## Performance Notes
366
367
- **Grayscale matching**: Faster but less precise than color matching
368
- **Confidence matching**: Requires OpenCV-Python (`pip install opencv-python`)
369
- **Region limiting**: Specify search regions to improve performance
370
- **Template size**: Smaller templates match faster
371
- **Screen resolution**: Higher resolutions increase matching time
372
- **Multiple matches**: `locateAllOnScreen()` is slower than single match functions