CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-yara-python

Python interface for YARA, a powerful malware identification and classification tool

Overall
score

85%

Evaluation85%

0.94x

Agent success when using this tile

Overview
Eval results
Files

matching.mddocs/

Pattern Matching and Scanning

Pattern matching applies compiled YARA rules to various data sources including files, memory buffers, and running processes. The matching engine supports callbacks, timeouts, external variables, and detailed result reporting.

Capabilities

Basic Data Matching

Scan data buffers, strings, and binary content with compiled rules.

class Rules:
    def match(self, filepath=None, pid=None, data=None, externals=None, callback=None,
              fast=False, timeout=60, modules_data=None, modules_callback=None, 
              which_callbacks=None):
        """
        Scan targets with compiled YARA rules.
        
        Parameters:
        - filepath (str, optional): Path to file to scan
        - pid (int, optional): Process ID to scan memory
        - data (bytes/str, optional): Data buffer to scan  
        - externals (dict, optional): External variables for this scan
        - callback (callable, optional): Callback function for results
        - fast (bool): Enable fast matching mode (default: False)
        - timeout (int): Scan timeout in seconds (default: 60)
        - modules_data (dict, optional): Data for YARA modules
        - modules_callback (callable, optional): Module data callback
        - which_callbacks (int, optional): Callback type flags
        
        Returns:
        list: List of Match objects for matching rules
        
        Raises:
        TimeoutError: If scan exceeds timeout limit
        """
    
    def profiling_info(self):
        """
        Returns profiling information if enabled during compilation.
        
        Returns:
        dict: Profiling data with performance metrics, or empty dict if profiling not enabled
        
        Note:
        Only available if the underlying YARA library was compiled with profiling support.
        """

Basic data scanning:

import yara

rules = yara.compile(source='''
rule SuspiciousPattern {
    strings:
        $text = "malicious"
        $hex = { 4D 5A }
    condition:
        $text or $hex
}
''')

# Scan string data
matches = rules.match(data="This contains malicious content")

# Scan binary data
binary_data = b"\x4D\x5A\x90\x00"  # MZ header + data
matches = rules.match(data=binary_data)

File Scanning

Scan files on disk by path, with automatic file handling and memory management.

File path scanning:

# Scan a single file
matches = rules.match(filepath="/path/to/suspicious_file.exe")

# Process results
for match in matches:
    print(f"File matched rule: {match.rule}")
    print(f"Namespace: {match.namespace}")
    print(f"Tags: {match.tags}")

Process Memory Scanning

Scan the memory space of running processes by process ID (platform-dependent feature).

Process scanning:

# Scan process memory (requires appropriate permissions)
try:
    matches = rules.match(pid=1234)  # Process ID
    for match in matches:
        print(f"Process memory matched: {match.rule}")
except PermissionError:
    print("Insufficient permissions to scan process memory")

Match Results

Match objects provide detailed information about rule matches and string locations.

class Match:
    """Represents a rule match result."""
    rule: str        # Name of the matching rule
    namespace: str   # Namespace of the matching rule
    tags: list       # Tags associated with the rule
    meta: dict       # Metadata dictionary from the rule
    strings: list    # List of (offset, identifier, data) tuples
                     # - offset (int): Byte offset where string was found
                     # - identifier (str): String variable name (e.g., '$pattern')
                     # - data (bytes): Actual matched bytes

Processing match results:

# Example data with patterns to match
test_data = b"Test data with malicious patterns and \x4D\x5A header"

matches = rules.match(data=test_data)

for match in matches:
    print(f"Matched Rule: {match.rule}")
    print(f"Namespace: {match.namespace}")
    print(f"Tags: {match.tags}")
    print(f"Metadata: {match.meta}")
    
    # Examine string matches in detail
    print(f"String matches: {len(match.strings)}")
    for offset, identifier, matched_data in match.strings:
        # offset: int - byte position in data where match was found
        # identifier: str - string variable name from rule (e.g., '$malicious', '$hex_pattern')
        # matched_data: bytes - actual bytes that matched the pattern
        
        print(f"  String {identifier}:")
        print(f"    Offset: {offset}")
        print(f"    Data: {matched_data}")
        print(f"    Length: {len(matched_data)} bytes")
        
        # Handle different data types
        if matched_data.isascii():
            print(f"    ASCII: {matched_data.decode('ascii', errors='ignore')}")
        else:
            print(f"    Hex: {matched_data.hex()}")

External Variables in Scanning

Override or provide external variables at scan time for dynamic rule behavior.

Runtime external variables:

rules = yara.compile(source='''
rule SizeCheck {
    condition:
        filesize > threshold
}
''')

# Provide external variable at scan time
matches = rules.match(
    filepath="/path/to/file.bin",
    externals={'threshold': 1024}
)

Callback-Based Scanning

Use callbacks to process matches as they occur, enabling streaming analysis and early termination.

def callback(data):
    """
    Callback function called for each rule evaluation.
    
    Parameters:
    - data (dict): Contains rule evaluation information with keys:
        - 'matches' (bool): Whether the rule matched
        - 'rule' (str): Rule identifier/name
        - 'namespace' (str): Rule namespace
        - 'tags' (list): List of rule tags
        - 'meta' (dict): Rule metadata dictionary
        - 'strings' (list): List of (offset, identifier, data) tuples for matches
    
    Returns:
    int: CALLBACK_CONTINUE to continue, CALLBACK_ABORT to stop
    """

def modules_callback(module_data):
    """
    Callback function for accessing module-specific data.
    
    Parameters:
    - module_data (dict): Module-specific data structures, may contain:
        - 'constants' (dict): Module constants
        - 'pe' (dict): PE module data (if PE file)
        - 'elf' (dict): ELF module data (if ELF file)
        - Other module-specific data based on YARA modules enabled
    
    Returns:
    int: CALLBACK_CONTINUE to continue, CALLBACK_ABORT to stop
    """

Basic callback example:

def match_callback(data):
    rule_name = data['rule']
    namespace = data['namespace']
    
    if data['matches']:
        print(f"✓ MATCH: {namespace}:{rule_name}")
        print(f"  Tags: {data['tags']}")
        print(f"  Metadata: {data['meta']}")
        
        # Show string matches
        for offset, identifier, matched_data in data['strings']:
            print(f"  String {identifier} at offset {offset}: {matched_data}")
        
        return yara.CALLBACK_CONTINUE
    else:
        print(f"○ No match: {namespace}:{rule_name}")
        return yara.CALLBACK_CONTINUE

matches = rules.match(
    data="test data with malicious content",
    callback=match_callback,
    which_callbacks=yara.CALLBACK_ALL  # Get callbacks for all rules
)

Callback control with which_callbacks:

# Only callback for matching rules
rules.match(data="test", callback=callback, which_callbacks=yara.CALLBACK_MATCHES)

# Only callback for non-matching rules  
rules.match(data="test", callback=callback, which_callbacks=yara.CALLBACK_NON_MATCHES)

# Callback for all rules (matches and non-matches)
rules.match(data="test", callback=callback, which_callbacks=yara.CALLBACK_ALL)

Module Data and Callbacks

Provide additional data to YARA modules and handle module-specific processing.

def modules_callback(module_data):
    """
    Callback for accessing module-specific data.
    
    Parameters:
    - module_data (dict): Module-specific data structures
    
    Returns:
    Module data can be accessed and processed
    """

Module callback example:

def module_callback(module_data):
    # Access PE module data if available
    if 'pe' in module_data:
        pe_data = module_data['pe']
        print(f"PE sections: {pe_data.get('sections', [])}")
    
    # Access other module data
    constants = module_data.get('constants', {})
    print(f"Available constants: {constants.keys()}")

matches = rules.match(
    filepath="/path/to/executable.exe",
    modules_callback=module_callback
)

Advanced Scanning Options

Control scanning behavior with timeouts, fast mode, and other performance options.

Timeout control:

try:
    # Set 30-second timeout
    matches = rules.match(filepath="/large/file.bin", timeout=30)
except yara.TimeoutError:
    print("Scan timed out after 30 seconds")

Fast scanning mode:

# Enable fast mode for performance (may miss some matches)
matches = rules.match(data="large data buffer", fast=True)

Comprehensive Scanning Example

A complete example demonstrating advanced scanning features:

import yara

# Compile rules with external variables
rules = yara.compile(source='''
rule AdvancedDetection {
    meta:
        description = "Advanced malware detection"
        author = "Security Team"
    strings:
        $sig1 = "suspicious_function"
        $sig2 = { 48 8B 05 [4] 48 8B 00 }
    condition:
        (filesize > min_size) and ($sig1 or $sig2)
}
''', externals={'min_size': 1024})

def comprehensive_callback(data):
    rule_name = data.get('rule', 'Unknown')
    if 'matches' in data:
        print(f"✓ MATCH: {rule_name}")
        return yara.CALLBACK_CONTINUE
    else:
        print(f"○ No match: {rule_name}")
        return yara.CALLBACK_CONTINUE

def module_processor(module_data):
    if 'pe' in module_data:
        print(f"Analyzing PE structure...")
    if 'hash' in module_data:
        print(f"Hash data available: {list(module_data['hash'].keys())}")

try:
    matches = rules.match(
        filepath="/path/to/sample.exe",
        callback=comprehensive_callback,
        modules_callback=module_processor,
        which_callbacks=yara.CALLBACK_ALL,
        timeout=120,
        externals={'min_size': 2048}  # Override compile-time external
    )
    
    print(f"\nFinal Results: {len(matches)} matches found")
    for match in matches:
        print(f"Rule: {match.rule}")
        print(f"Tags: {', '.join(match.tags)}")
        for offset, name, data in match.strings:
            print(f"  {name} at {offset}: {data[:50]}...")
            
except yara.TimeoutError:
    print("Scan exceeded timeout limit")
except Exception as e:
    print(f"Scan error: {e}")

Performance Profiling

Access performance profiling information if YARA was compiled with profiling support.

class Rules:
    def profiling_info(self):
        """
        Returns profiling information if enabled during compilation.
        
        Returns:
        dict: Profiling data with performance metrics, or empty dict if profiling not enabled
        
        Note:
        Only available if the underlying YARA library was compiled with profiling support.
        """

Profiling information usage:

# Compile rules (profiling info only available if YARA built with profiling)
rules = yara.compile(source='''
rule TestRule {
    strings:
        $pattern = "test"
    condition:
        $pattern
}
''')

# Perform scanning
matches = rules.match(data="test data")

# Get profiling information
profile_data = rules.profiling_info()
if profile_data:
    print("Profiling data available:")
    print(f"Performance metrics: {profile_data}")
else:
    print("No profiling data (YARA not compiled with profiling support)")

Install with Tessl CLI

npx tessl i tessl/pypi-yara-python

docs

compilation.md

config.md

index.md

matching.md

tile.json