or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

compilation.mdconfig.mdindex.mdmatching.md

matching.mddocs/

0

# Pattern Matching and Scanning

1

2

Pattern matching applies compiled YARA rules to various data sources including files, memory buffers, and running processes. The matching engine supports callbacks, timeouts, external variables, and detailed result reporting.

3

4

## Capabilities

5

6

### Basic Data Matching

7

8

Scan data buffers, strings, and binary content with compiled rules.

9

10

```python { .api }

11

class Rules:

12

def match(self, filepath=None, pid=None, data=None, externals=None, callback=None,

13

fast=False, timeout=60, modules_data=None, modules_callback=None,

14

which_callbacks=None):

15

"""

16

Scan targets with compiled YARA rules.

17

18

Parameters:

19

- filepath (str, optional): Path to file to scan

20

- pid (int, optional): Process ID to scan memory

21

- data (bytes/str, optional): Data buffer to scan

22

- externals (dict, optional): External variables for this scan

23

- callback (callable, optional): Callback function for results

24

- fast (bool): Enable fast matching mode (default: False)

25

- timeout (int): Scan timeout in seconds (default: 60)

26

- modules_data (dict, optional): Data for YARA modules

27

- modules_callback (callable, optional): Module data callback

28

- which_callbacks (int, optional): Callback type flags

29

30

Returns:

31

list: List of Match objects for matching rules

32

33

Raises:

34

TimeoutError: If scan exceeds timeout limit

35

"""

36

37

def profiling_info(self):

38

"""

39

Returns profiling information if enabled during compilation.

40

41

Returns:

42

dict: Profiling data with performance metrics, or empty dict if profiling not enabled

43

44

Note:

45

Only available if the underlying YARA library was compiled with profiling support.

46

"""

47

```

48

49

**Basic data scanning:**

50

51

```python

52

import yara

53

54

rules = yara.compile(source='''

55

rule SuspiciousPattern {

56

strings:

57

$text = "malicious"

58

$hex = { 4D 5A }

59

condition:

60

$text or $hex

61

}

62

''')

63

64

# Scan string data

65

matches = rules.match(data="This contains malicious content")

66

67

# Scan binary data

68

binary_data = b"\x4D\x5A\x90\x00" # MZ header + data

69

matches = rules.match(data=binary_data)

70

```

71

72

### File Scanning

73

74

Scan files on disk by path, with automatic file handling and memory management.

75

76

**File path scanning:**

77

78

```python

79

# Scan a single file

80

matches = rules.match(filepath="/path/to/suspicious_file.exe")

81

82

# Process results

83

for match in matches:

84

print(f"File matched rule: {match.rule}")

85

print(f"Namespace: {match.namespace}")

86

print(f"Tags: {match.tags}")

87

```

88

89

### Process Memory Scanning

90

91

Scan the memory space of running processes by process ID (platform-dependent feature).

92

93

**Process scanning:**

94

95

```python

96

# Scan process memory (requires appropriate permissions)

97

try:

98

matches = rules.match(pid=1234) # Process ID

99

for match in matches:

100

print(f"Process memory matched: {match.rule}")

101

except PermissionError:

102

print("Insufficient permissions to scan process memory")

103

```

104

105

### Match Results

106

107

Match objects provide detailed information about rule matches and string locations.

108

109

```python { .api }

110

class Match:

111

"""Represents a rule match result."""

112

rule: str # Name of the matching rule

113

namespace: str # Namespace of the matching rule

114

tags: list # Tags associated with the rule

115

meta: dict # Metadata dictionary from the rule

116

strings: list # List of (offset, identifier, data) tuples

117

# - offset (int): Byte offset where string was found

118

# - identifier (str): String variable name (e.g., '$pattern')

119

# - data (bytes): Actual matched bytes

120

```

121

122

**Processing match results:**

123

124

```python

125

# Example data with patterns to match

126

test_data = b"Test data with malicious patterns and \x4D\x5A header"

127

128

matches = rules.match(data=test_data)

129

130

for match in matches:

131

print(f"Matched Rule: {match.rule}")

132

print(f"Namespace: {match.namespace}")

133

print(f"Tags: {match.tags}")

134

print(f"Metadata: {match.meta}")

135

136

# Examine string matches in detail

137

print(f"String matches: {len(match.strings)}")

138

for offset, identifier, matched_data in match.strings:

139

# offset: int - byte position in data where match was found

140

# identifier: str - string variable name from rule (e.g., '$malicious', '$hex_pattern')

141

# matched_data: bytes - actual bytes that matched the pattern

142

143

print(f" String {identifier}:")

144

print(f" Offset: {offset}")

145

print(f" Data: {matched_data}")

146

print(f" Length: {len(matched_data)} bytes")

147

148

# Handle different data types

149

if matched_data.isascii():

150

print(f" ASCII: {matched_data.decode('ascii', errors='ignore')}")

151

else:

152

print(f" Hex: {matched_data.hex()}")

153

```

154

155

### External Variables in Scanning

156

157

Override or provide external variables at scan time for dynamic rule behavior.

158

159

**Runtime external variables:**

160

161

```python

162

rules = yara.compile(source='''

163

rule SizeCheck {

164

condition:

165

filesize > threshold

166

}

167

''')

168

169

# Provide external variable at scan time

170

matches = rules.match(

171

filepath="/path/to/file.bin",

172

externals={'threshold': 1024}

173

)

174

```

175

176

### Callback-Based Scanning

177

178

Use callbacks to process matches as they occur, enabling streaming analysis and early termination.

179

180

```python { .api }

181

def callback(data):

182

"""

183

Callback function called for each rule evaluation.

184

185

Parameters:

186

- data (dict): Contains rule evaluation information with keys:

187

- 'matches' (bool): Whether the rule matched

188

- 'rule' (str): Rule identifier/name

189

- 'namespace' (str): Rule namespace

190

- 'tags' (list): List of rule tags

191

- 'meta' (dict): Rule metadata dictionary

192

- 'strings' (list): List of (offset, identifier, data) tuples for matches

193

194

Returns:

195

int: CALLBACK_CONTINUE to continue, CALLBACK_ABORT to stop

196

"""

197

198

def modules_callback(module_data):

199

"""

200

Callback function for accessing module-specific data.

201

202

Parameters:

203

- module_data (dict): Module-specific data structures, may contain:

204

- 'constants' (dict): Module constants

205

- 'pe' (dict): PE module data (if PE file)

206

- 'elf' (dict): ELF module data (if ELF file)

207

- Other module-specific data based on YARA modules enabled

208

209

Returns:

210

int: CALLBACK_CONTINUE to continue, CALLBACK_ABORT to stop

211

"""

212

```

213

214

**Basic callback example:**

215

216

```python

217

def match_callback(data):

218

rule_name = data['rule']

219

namespace = data['namespace']

220

221

if data['matches']:

222

print(f"✓ MATCH: {namespace}:{rule_name}")

223

print(f" Tags: {data['tags']}")

224

print(f" Metadata: {data['meta']}")

225

226

# Show string matches

227

for offset, identifier, matched_data in data['strings']:

228

print(f" String {identifier} at offset {offset}: {matched_data}")

229

230

return yara.CALLBACK_CONTINUE

231

else:

232

print(f"○ No match: {namespace}:{rule_name}")

233

return yara.CALLBACK_CONTINUE

234

235

matches = rules.match(

236

data="test data with malicious content",

237

callback=match_callback,

238

which_callbacks=yara.CALLBACK_ALL # Get callbacks for all rules

239

)

240

```

241

242

**Callback control with which_callbacks:**

243

244

```python

245

# Only callback for matching rules

246

rules.match(data="test", callback=callback, which_callbacks=yara.CALLBACK_MATCHES)

247

248

# Only callback for non-matching rules

249

rules.match(data="test", callback=callback, which_callbacks=yara.CALLBACK_NON_MATCHES)

250

251

# Callback for all rules (matches and non-matches)

252

rules.match(data="test", callback=callback, which_callbacks=yara.CALLBACK_ALL)

253

```

254

255

### Module Data and Callbacks

256

257

Provide additional data to YARA modules and handle module-specific processing.

258

259

```python { .api }

260

def modules_callback(module_data):

261

"""

262

Callback for accessing module-specific data.

263

264

Parameters:

265

- module_data (dict): Module-specific data structures

266

267

Returns:

268

Module data can be accessed and processed

269

"""

270

```

271

272

**Module callback example:**

273

274

```python

275

def module_callback(module_data):

276

# Access PE module data if available

277

if 'pe' in module_data:

278

pe_data = module_data['pe']

279

print(f"PE sections: {pe_data.get('sections', [])}")

280

281

# Access other module data

282

constants = module_data.get('constants', {})

283

print(f"Available constants: {constants.keys()}")

284

285

matches = rules.match(

286

filepath="/path/to/executable.exe",

287

modules_callback=module_callback

288

)

289

```

290

291

### Advanced Scanning Options

292

293

Control scanning behavior with timeouts, fast mode, and other performance options.

294

295

**Timeout control:**

296

297

```python

298

try:

299

# Set 30-second timeout

300

matches = rules.match(filepath="/large/file.bin", timeout=30)

301

except yara.TimeoutError:

302

print("Scan timed out after 30 seconds")

303

```

304

305

**Fast scanning mode:**

306

307

```python

308

# Enable fast mode for performance (may miss some matches)

309

matches = rules.match(data="large data buffer", fast=True)

310

```

311

312

### Comprehensive Scanning Example

313

314

A complete example demonstrating advanced scanning features:

315

316

```python

317

import yara

318

319

# Compile rules with external variables

320

rules = yara.compile(source='''

321

rule AdvancedDetection {

322

meta:

323

description = "Advanced malware detection"

324

author = "Security Team"

325

strings:

326

$sig1 = "suspicious_function"

327

$sig2 = { 48 8B 05 [4] 48 8B 00 }

328

condition:

329

(filesize > min_size) and ($sig1 or $sig2)

330

}

331

''', externals={'min_size': 1024})

332

333

def comprehensive_callback(data):

334

rule_name = data.get('rule', 'Unknown')

335

if 'matches' in data:

336

print(f"✓ MATCH: {rule_name}")

337

return yara.CALLBACK_CONTINUE

338

else:

339

print(f"○ No match: {rule_name}")

340

return yara.CALLBACK_CONTINUE

341

342

def module_processor(module_data):

343

if 'pe' in module_data:

344

print(f"Analyzing PE structure...")

345

if 'hash' in module_data:

346

print(f"Hash data available: {list(module_data['hash'].keys())}")

347

348

try:

349

matches = rules.match(

350

filepath="/path/to/sample.exe",

351

callback=comprehensive_callback,

352

modules_callback=module_processor,

353

which_callbacks=yara.CALLBACK_ALL,

354

timeout=120,

355

externals={'min_size': 2048} # Override compile-time external

356

)

357

358

print(f"\nFinal Results: {len(matches)} matches found")

359

for match in matches:

360

print(f"Rule: {match.rule}")

361

print(f"Tags: {', '.join(match.tags)}")

362

for offset, name, data in match.strings:

363

print(f" {name} at {offset}: {data[:50]}...")

364

365

except yara.TimeoutError:

366

print("Scan exceeded timeout limit")

367

except Exception as e:

368

print(f"Scan error: {e}")

369

```

370

371

### Performance Profiling

372

373

Access performance profiling information if YARA was compiled with profiling support.

374

375

```python { .api }

376

class Rules:

377

def profiling_info(self):

378

"""

379

Returns profiling information if enabled during compilation.

380

381

Returns:

382

dict: Profiling data with performance metrics, or empty dict if profiling not enabled

383

384

Note:

385

Only available if the underlying YARA library was compiled with profiling support.

386

"""

387

```

388

389

**Profiling information usage:**

390

391

```python

392

# Compile rules (profiling info only available if YARA built with profiling)

393

rules = yara.compile(source='''

394

rule TestRule {

395

strings:

396

$pattern = "test"

397

condition:

398

$pattern

399

}

400

''')

401

402

# Perform scanning

403

matches = rules.match(data="test data")

404

405

# Get profiling information

406

profile_data = rules.profiling_info()

407

if profile_data:

408

print("Profiling data available:")

409

print(f"Performance metrics: {profile_data}")

410

else:

411

print("No profiling data (YARA not compiled with profiling support)")

412

```