0
# Pattern Matching and Scanning
1
2
Pattern matching applies compiled YARA rules to various data sources including files, memory buffers, and running processes. The matching engine supports callbacks, timeouts, external variables, and detailed result reporting.
3
4
## Capabilities
5
6
### Basic Data Matching
7
8
Scan data buffers, strings, and binary content with compiled rules.
9
10
```python { .api }
11
class Rules:
12
def match(self, filepath=None, pid=None, data=None, externals=None, callback=None,
13
fast=False, timeout=60, modules_data=None, modules_callback=None,
14
which_callbacks=None):
15
"""
16
Scan targets with compiled YARA rules.
17
18
Parameters:
19
- filepath (str, optional): Path to file to scan
20
- pid (int, optional): Process ID to scan memory
21
- data (bytes/str, optional): Data buffer to scan
22
- externals (dict, optional): External variables for this scan
23
- callback (callable, optional): Callback function for results
24
- fast (bool): Enable fast matching mode (default: False)
25
- timeout (int): Scan timeout in seconds (default: 60)
26
- modules_data (dict, optional): Data for YARA modules
27
- modules_callback (callable, optional): Module data callback
28
- which_callbacks (int, optional): Callback type flags
29
30
Returns:
31
list: List of Match objects for matching rules
32
33
Raises:
34
TimeoutError: If scan exceeds timeout limit
35
"""
36
37
def profiling_info(self):
38
"""
39
Returns profiling information if enabled during compilation.
40
41
Returns:
42
dict: Profiling data with performance metrics, or empty dict if profiling not enabled
43
44
Note:
45
Only available if the underlying YARA library was compiled with profiling support.
46
"""
47
```
48
49
**Basic data scanning:**
50
51
```python
52
import yara
53
54
rules = yara.compile(source='''
55
rule SuspiciousPattern {
56
strings:
57
$text = "malicious"
58
$hex = { 4D 5A }
59
condition:
60
$text or $hex
61
}
62
''')
63
64
# Scan string data
65
matches = rules.match(data="This contains malicious content")
66
67
# Scan binary data
68
binary_data = b"\x4D\x5A\x90\x00" # MZ header + data
69
matches = rules.match(data=binary_data)
70
```
71
72
### File Scanning
73
74
Scan files on disk by path, with automatic file handling and memory management.
75
76
**File path scanning:**
77
78
```python
79
# Scan a single file
80
matches = rules.match(filepath="/path/to/suspicious_file.exe")
81
82
# Process results
83
for match in matches:
84
print(f"File matched rule: {match.rule}")
85
print(f"Namespace: {match.namespace}")
86
print(f"Tags: {match.tags}")
87
```
88
89
### Process Memory Scanning
90
91
Scan the memory space of running processes by process ID (platform-dependent feature).
92
93
**Process scanning:**
94
95
```python
96
# Scan process memory (requires appropriate permissions)
97
try:
98
matches = rules.match(pid=1234) # Process ID
99
for match in matches:
100
print(f"Process memory matched: {match.rule}")
101
except PermissionError:
102
print("Insufficient permissions to scan process memory")
103
```
104
105
### Match Results
106
107
Match objects provide detailed information about rule matches and string locations.
108
109
```python { .api }
110
class Match:
111
"""Represents a rule match result."""
112
rule: str # Name of the matching rule
113
namespace: str # Namespace of the matching rule
114
tags: list # Tags associated with the rule
115
meta: dict # Metadata dictionary from the rule
116
strings: list # List of (offset, identifier, data) tuples
117
# - offset (int): Byte offset where string was found
118
# - identifier (str): String variable name (e.g., '$pattern')
119
# - data (bytes): Actual matched bytes
120
```
121
122
**Processing match results:**
123
124
```python
125
# Example data with patterns to match
126
test_data = b"Test data with malicious patterns and \x4D\x5A header"
127
128
matches = rules.match(data=test_data)
129
130
for match in matches:
131
print(f"Matched Rule: {match.rule}")
132
print(f"Namespace: {match.namespace}")
133
print(f"Tags: {match.tags}")
134
print(f"Metadata: {match.meta}")
135
136
# Examine string matches in detail
137
print(f"String matches: {len(match.strings)}")
138
for offset, identifier, matched_data in match.strings:
139
# offset: int - byte position in data where match was found
140
# identifier: str - string variable name from rule (e.g., '$malicious', '$hex_pattern')
141
# matched_data: bytes - actual bytes that matched the pattern
142
143
print(f" String {identifier}:")
144
print(f" Offset: {offset}")
145
print(f" Data: {matched_data}")
146
print(f" Length: {len(matched_data)} bytes")
147
148
# Handle different data types
149
if matched_data.isascii():
150
print(f" ASCII: {matched_data.decode('ascii', errors='ignore')}")
151
else:
152
print(f" Hex: {matched_data.hex()}")
153
```
154
155
### External Variables in Scanning
156
157
Override or provide external variables at scan time for dynamic rule behavior.
158
159
**Runtime external variables:**
160
161
```python
162
rules = yara.compile(source='''
163
rule SizeCheck {
164
condition:
165
filesize > threshold
166
}
167
''')
168
169
# Provide external variable at scan time
170
matches = rules.match(
171
filepath="/path/to/file.bin",
172
externals={'threshold': 1024}
173
)
174
```
175
176
### Callback-Based Scanning
177
178
Use callbacks to process matches as they occur, enabling streaming analysis and early termination.
179
180
```python { .api }
181
def callback(data):
182
"""
183
Callback function called for each rule evaluation.
184
185
Parameters:
186
- data (dict): Contains rule evaluation information with keys:
187
- 'matches' (bool): Whether the rule matched
188
- 'rule' (str): Rule identifier/name
189
- 'namespace' (str): Rule namespace
190
- 'tags' (list): List of rule tags
191
- 'meta' (dict): Rule metadata dictionary
192
- 'strings' (list): List of (offset, identifier, data) tuples for matches
193
194
Returns:
195
int: CALLBACK_CONTINUE to continue, CALLBACK_ABORT to stop
196
"""
197
198
def modules_callback(module_data):
199
"""
200
Callback function for accessing module-specific data.
201
202
Parameters:
203
- module_data (dict): Module-specific data structures, may contain:
204
- 'constants' (dict): Module constants
205
- 'pe' (dict): PE module data (if PE file)
206
- 'elf' (dict): ELF module data (if ELF file)
207
- Other module-specific data based on YARA modules enabled
208
209
Returns:
210
int: CALLBACK_CONTINUE to continue, CALLBACK_ABORT to stop
211
"""
212
```
213
214
**Basic callback example:**
215
216
```python
217
def match_callback(data):
218
rule_name = data['rule']
219
namespace = data['namespace']
220
221
if data['matches']:
222
print(f"✓ MATCH: {namespace}:{rule_name}")
223
print(f" Tags: {data['tags']}")
224
print(f" Metadata: {data['meta']}")
225
226
# Show string matches
227
for offset, identifier, matched_data in data['strings']:
228
print(f" String {identifier} at offset {offset}: {matched_data}")
229
230
return yara.CALLBACK_CONTINUE
231
else:
232
print(f"○ No match: {namespace}:{rule_name}")
233
return yara.CALLBACK_CONTINUE
234
235
matches = rules.match(
236
data="test data with malicious content",
237
callback=match_callback,
238
which_callbacks=yara.CALLBACK_ALL # Get callbacks for all rules
239
)
240
```
241
242
**Callback control with which_callbacks:**
243
244
```python
245
# Only callback for matching rules
246
rules.match(data="test", callback=callback, which_callbacks=yara.CALLBACK_MATCHES)
247
248
# Only callback for non-matching rules
249
rules.match(data="test", callback=callback, which_callbacks=yara.CALLBACK_NON_MATCHES)
250
251
# Callback for all rules (matches and non-matches)
252
rules.match(data="test", callback=callback, which_callbacks=yara.CALLBACK_ALL)
253
```
254
255
### Module Data and Callbacks
256
257
Provide additional data to YARA modules and handle module-specific processing.
258
259
```python { .api }
260
def modules_callback(module_data):
261
"""
262
Callback for accessing module-specific data.
263
264
Parameters:
265
- module_data (dict): Module-specific data structures
266
267
Returns:
268
Module data can be accessed and processed
269
"""
270
```
271
272
**Module callback example:**
273
274
```python
275
def module_callback(module_data):
276
# Access PE module data if available
277
if 'pe' in module_data:
278
pe_data = module_data['pe']
279
print(f"PE sections: {pe_data.get('sections', [])}")
280
281
# Access other module data
282
constants = module_data.get('constants', {})
283
print(f"Available constants: {constants.keys()}")
284
285
matches = rules.match(
286
filepath="/path/to/executable.exe",
287
modules_callback=module_callback
288
)
289
```
290
291
### Advanced Scanning Options
292
293
Control scanning behavior with timeouts, fast mode, and other performance options.
294
295
**Timeout control:**
296
297
```python
298
try:
299
# Set 30-second timeout
300
matches = rules.match(filepath="/large/file.bin", timeout=30)
301
except yara.TimeoutError:
302
print("Scan timed out after 30 seconds")
303
```
304
305
**Fast scanning mode:**
306
307
```python
308
# Enable fast mode for performance (may miss some matches)
309
matches = rules.match(data="large data buffer", fast=True)
310
```
311
312
### Comprehensive Scanning Example
313
314
A complete example demonstrating advanced scanning features:
315
316
```python
317
import yara
318
319
# Compile rules with external variables
320
rules = yara.compile(source='''
321
rule AdvancedDetection {
322
meta:
323
description = "Advanced malware detection"
324
author = "Security Team"
325
strings:
326
$sig1 = "suspicious_function"
327
$sig2 = { 48 8B 05 [4] 48 8B 00 }
328
condition:
329
(filesize > min_size) and ($sig1 or $sig2)
330
}
331
''', externals={'min_size': 1024})
332
333
def comprehensive_callback(data):
334
rule_name = data.get('rule', 'Unknown')
335
if 'matches' in data:
336
print(f"✓ MATCH: {rule_name}")
337
return yara.CALLBACK_CONTINUE
338
else:
339
print(f"○ No match: {rule_name}")
340
return yara.CALLBACK_CONTINUE
341
342
def module_processor(module_data):
343
if 'pe' in module_data:
344
print(f"Analyzing PE structure...")
345
if 'hash' in module_data:
346
print(f"Hash data available: {list(module_data['hash'].keys())}")
347
348
try:
349
matches = rules.match(
350
filepath="/path/to/sample.exe",
351
callback=comprehensive_callback,
352
modules_callback=module_processor,
353
which_callbacks=yara.CALLBACK_ALL,
354
timeout=120,
355
externals={'min_size': 2048} # Override compile-time external
356
)
357
358
print(f"\nFinal Results: {len(matches)} matches found")
359
for match in matches:
360
print(f"Rule: {match.rule}")
361
print(f"Tags: {', '.join(match.tags)}")
362
for offset, name, data in match.strings:
363
print(f" {name} at {offset}: {data[:50]}...")
364
365
except yara.TimeoutError:
366
print("Scan exceeded timeout limit")
367
except Exception as e:
368
print(f"Scan error: {e}")
369
```
370
371
### Performance Profiling
372
373
Access performance profiling information if YARA was compiled with profiling support.
374
375
```python { .api }
376
class Rules:
377
def profiling_info(self):
378
"""
379
Returns profiling information if enabled during compilation.
380
381
Returns:
382
dict: Profiling data with performance metrics, or empty dict if profiling not enabled
383
384
Note:
385
Only available if the underlying YARA library was compiled with profiling support.
386
"""
387
```
388
389
**Profiling information usage:**
390
391
```python
392
# Compile rules (profiling info only available if YARA built with profiling)
393
rules = yara.compile(source='''
394
rule TestRule {
395
strings:
396
$pattern = "test"
397
condition:
398
$pattern
399
}
400
''')
401
402
# Perform scanning
403
matches = rules.match(data="test data")
404
405
# Get profiling information
406
profile_data = rules.profiling_info()
407
if profile_data:
408
print("Profiling data available:")
409
print(f"Performance metrics: {profile_data}")
410
else:
411
print("No profiling data (YARA not compiled with profiling support)")
412
```