Performing YARA Rule Development for Detection

Overview

YARA is the pattern matching swiss knife for malware researchers, enabling identification and classification of malware based on textual or binary patterns. Effective YARA rules combine unique string patterns, byte sequences, PE header characteristics, import table analysis, and conditional logic to detect malware families while avoiding false positives. Modern YARA-X (rewritten in Rust, stable since June 2025) brings improved performance and new modules. Rules should target unpacked malware artifacts like hardcoded stack strings, C2 URLs, mutex names, encryption constants, and unique code sequences rather than packer signatures.

Prerequisites

Python 3.9+ with yara-python library
YARA 4.5+ or YARA-X 0.10+
PE analysis tools (pefile, pestudio)
Hex editor for identifying unique byte patterns
Access to malware samples (VirusTotal, MalwareBazaar)
Understanding of PE file format, strings, and import tables

Key Concepts

Rule Structure

Every YARA rule consists of three sections: meta (optional descriptive metadata), strings (pattern definitions), and condition (matching logic). String types include text strings (ASCII/wide/nocase), hex patterns with wildcards and jumps, and regular expressions. Conditions combine string matches with file properties using boolean operators.

String Selection Strategy

Effective rules target patterns that are unique to the malware family and survive recompilation. Hardcoded stack strings are excellent choices because compilers embed them consistently. C2 domain patterns, custom encryption routines, unique error messages, and specific API call sequences provide stable detection anchors. Avoid compiler-generated boilerplate and common library strings.

Performance Optimization

YARA evaluates conditions short-circuit style. Place the most discriminating and cheapest-to-evaluate conditions first. Use filesize limits to skip irrelevant files quickly. Minimize regex usage in favor of hex patterns. Use private rules as building blocks for complex detection logic without generating standalone matches.

Practical Steps

Step 1: Analyze Sample for Unique Patterns

#!/usr/bin/env python3
"""Extract candidate strings and byte patterns for YARA rule creation."""
import pefile
import re
import sys
from collections import Counter


def extract_strings(filepath, min_length=6):
    """Extract ASCII and wide strings from binary."""
    with open(filepath, 'rb') as f:
        data = f.read()

    # ASCII strings
    ascii_strings = re.findall(
        rb'[\x20-\x7e]{' + str(min_length).encode() + rb',}', data
    )

    # Wide (UTF-16LE) strings
    wide_strings = re.findall(
        rb'(?:[\x20-\x7e]\x00){' + str(min_length).encode() + rb',}', data
    )

    return {
        'ascii': [s.decode('ascii') for s in ascii_strings],
        'wide': [s.decode('utf-16-le') for s in wide_strings],
    }


def analyze_pe_imports(filepath):
    """Extract import table for API-based detection."""
    try:
        pe = pefile.PE(filepath)
    except pefile.PEFormatError:
        return []

    imports = []
    if hasattr(pe, 'DIRECTORY_ENTRY_IMPORT'):
        for entry in pe.DIRECTORY_ENTRY_IMPORT:
            dll_name = entry.dll.decode('utf-8', errors='replace')
            for imp in entry.imports:
                if imp.name:
                    func_name = imp.name.decode('utf-8', errors='replace')
                    imports.append(f"{dll_name}!{func_name}")
    return imports


def find_unique_byte_patterns(filepath, pattern_length=16):
    """Find unique byte sequences suitable for YARA hex patterns."""
    with open(filepath, 'rb') as f:
        data = f.read()

    try:
        pe = pefile.PE(filepath)
        # Focus on code section
        for section in pe.sections:
            if section.Characteristics & 0x20000000:  # IMAGE_SCN_MEM_EXECUTE
                code_start = section.PointerToRawData
                code_end = code_start + section.SizeOfRawData
                code_data = data[code_start:code_end]
                break
        else:
            code_data = data
    except Exception:
        code_data = data

    # Find byte patterns that appear exactly once
    patterns = []
    for i in range(0, len(code_data) - pattern_length, 4):
        pattern = code_data[i:i+pattern_length]
        if pattern.count(b'\x00') < pattern_length // 3:  # Skip null-heavy
            hex_pattern = ' '.join(f'{b:02X}' for b in pattern)
            patterns.append(hex_pattern)

    # Count frequency and return unique ones
    freq = Counter(patterns)
    unique = [p for p, count in freq.items() if count == 1]

    return unique[:20]  # Top 20 candidates


def suggest_rule_strings(filepath):
    """Suggest strings and patterns for YARA rule."""
    print(f"[+] Analyzing: {filepath}")

    # Extract strings
    strings = extract_strings(filepath)

    # Filter for suspicious/unique strings
    suspicious_keywords = [
        'http', 'https', 'cmd', 'powershell', 'mutex', 'pipe',
        'password', 'credential', 'inject', 'hook', 'debug',
        'sandbox', 'virtual', 'vmware', 'vbox',
    ]

    print("\n[+] Suspicious ASCII strings:")
    for s in strings['ascii']:
        if any(kw in s.lower() for kw in suspicious_keywords):
            print(f"  $ = \"{s}\" ascii")

    print("\n[+] Suspicious wide strings:")
    for s in strings['wide']:
        if any(kw in s.lower() for kw in suspicious_keywords):
            print(f"  $ = \"{s}\" wide")

    # Import analysis
    imports = analyze_pe_imports(filepath)
    suspicious_apis = [
        'VirtualAlloc', 'VirtualProtect', 'WriteProcessMemory',
        'CreateRemoteThread', 'NtUnmapViewOfSection', 'RtlMoveMemory',
        'OpenProcess', 'CreateToolhelp32Snapshot',
        'InternetOpenA', 'HttpSendRequestA',
        'CryptEncrypt', 'CryptDecrypt',
    ]

    print("\n[+] Suspicious imports:")
    for imp in imports:
        func = imp.split('!')[-1]
        if func in suspicious_apis:
            print(f"  {imp}")

    # Byte patterns
    print("\n[+] Candidate hex patterns:")
    patterns = find_unique_byte_patterns(filepath)
    for p in patterns[:5]:
        print(f"  $hex = {{ {p} }}")


if __name__ == "__main__":
    if len(sys.argv) < 2:
        print(f"Usage: {sys.argv[0]} <sample_path>")
        sys.exit(1)
    suggest_rule_strings(sys.argv[1])

Step 2: Write and Test YARA Rules

import yara
import os

def create_yara_rule(rule_name, meta, strings, condition):
    """Generate a YARA rule from components."""
    meta_str = "\n".join(f'        {k} = "{v}"' for k, v in meta.items())
    strings_str = "\n".join(f"        {s}" for s in strings)

    rule = f"""rule {rule_name} {{
    meta:
{meta_str}

    strings:
{strings_str}

    condition:
        {condition}
}}"""
    return rule


def test_yara_rule(rule_text, test_dir):
    """Compile and test YARA rule against sample directory."""
    try:
        rules = yara.compile(source=rule_text)
    except yara.SyntaxError as e:
        print(f"[-] YARA syntax error: {e}")
        return None

    results = {"matches": [], "no_match": []}

    for filename in os.listdir(test_dir):
        filepath = os.path.join(test_dir, filename)
        if not os.path.isfile(filepath):
            continue

        matches = rules.match(filepath)
        if matches:
            results["matches"].append({
                "file": filename,
                "rules": [m.rule for m in matches],
            })
        else:
            results["no_match"].append(filename)

    print(f"[+] Matches: {len(results['matches'])}")
    print(f"[-] No match: {len(results['no_match'])}")
    return results


# Example: Create a rule for a hypothetical malware family
example_rule = create_yara_rule(
    rule_name="MalwareFamily_Variant_A",
    meta={
        "description": "Detects MalwareFamily Variant A",
        "author": "Malware Analysis Team",
        "date": "2025-01-01",
        "hash": "abc123...",
        "tlp": "WHITE",
    },
    strings=[
        '$mutex = "Global\\\\UniqueM4lwareMutex" ascii wide',
        '$c2_pattern = /https?:\\/\\/[a-z]{5,10}\\.(xyz|top|buzz)\\/gate\\.php/',
        '$api1 = "VirtualAllocEx" ascii',
        '$api2 = "WriteProcessMemory" ascii',
        '$api3 = "CreateRemoteThread" ascii',
        '$hex_decrypt = { 8B 45 ?? 33 C1 89 45 ?? 83 C1 04 }',
        '$pdb = "C:\\\\Users\\\\" ascii',
    ],
    condition=(
        'uint16(0) == 0x5A4D and filesize < 2MB and '
        '($mutex or $c2_pattern) and '
        '2 of ($api*) and '
        '$hex_decrypt'
    ),
)

print(example_rule)

Step 3: Performance Testing and Optimization

import time

def benchmark_rule(rule_text, scan_directory, iterations=3):
    """Benchmark YARA rule scan performance."""
    rules = yara.compile(source=rule_text)

    files = []
    for root, _, filenames in os.walk(scan_directory):
        for f in filenames:
            files.append(os.path.join(root, f))

    print(f"[+] Benchmarking against {len(files)} files "
          f"({iterations} iterations)")

    times = []
    for i in range(iterations):
        start = time.perf_counter()
        matches = 0
        for filepath in files:
            try:
                result = rules.match(filepath)
                if result:
                    matches += 1
            except Exception:
                pass
        elapsed = time.perf_counter() - start
        times.append(elapsed)
        print(f"  Iteration {i+1}: {elapsed:.3f}s ({matches} matches)")

    avg_time = sum(times) / len(times)
    files_per_sec = len(files) / avg_time
    print(f"\n[+] Average: {avg_time:.3f}s ({files_per_sec:.0f} files/sec)")
    return avg_time

Validation Criteria

YARA rules compile without syntax errors
Rules detect target malware family samples with zero false negatives
False positive rate below 0.1% when scanned against clean file corpus
Rule performance allows scanning 1000+ files per second
Rules survive minor malware modifications (recompilation, string changes)
Metadata includes hash, author, date, description, and TLP marking

Compliance Framework Mapping

This skill supports compliance evidence collection across multiple frameworks:

SOC 2: CC7.2 (Anomaly Detection), CC7.4 (Incident Response)
ISO 27001: A.12.2 (Malware Protection), A.16.1 (Security Incident Management)
NIST 800-53: SI-3 (Malicious Code Protection), IR-4 (Incident Handling)
NIST CSF: DE.CM (Continuous Monitoring), RS.AN (Analysis)

Claw GRC Tip: When this skill is executed by a registered agent, compliance evidence is automatically captured and mapped to the relevant controls in your active frameworks.

Deploying This Skill with Claw GRC

Agent Execution

# Install via CLI
npx claw-grc skills add performing-yara-rule-development-for-detection

# Or load dynamically via MCP
grc.load_skill("performing-yara-rule-development-for-detection")

Audit Trail Integration

When executed through Claw GRC, every step of this skill generates tamper-evident audit records:

SHA-256 chain hashing ensures no step can be modified after execution
Evidence artifacts (configs, scan results, logs) are automatically attached to relevant controls
Trust score impact — successful execution increases your agent's trust score

Continuous Compliance

Schedule this skill for recurring execution to maintain continuous compliance posture. Claw GRC monitors for drift and alerts when re-execution is needed.

References

Performing YARA Rule Development for Detection

Overview

Prerequisites

Python 3.9+ with yara-python library
YARA 4.5+ or YARA-X 0.10+
PE analysis tools (pefile, pestudio)
Hex editor for identifying unique byte patterns
Access to malware samples (VirusTotal, MalwareBazaar)
Understanding of PE file format, strings, and import tables

Key Concepts

Rule Structure

String Selection Strategy

Performance Optimization

Practical Steps

Step 1: Analyze Sample for Unique Patterns

#!/usr/bin/env python3
"""Extract candidate strings and byte patterns for YARA rule creation."""
import pefile
import re
import sys
from collections import Counter


def extract_strings(filepath, min_length=6):
    """Extract ASCII and wide strings from binary."""
    with open(filepath, 'rb') as f:
        data = f.read()

    # ASCII strings
    ascii_strings = re.findall(
        rb'[\x20-\x7e]{' + str(min_length).encode() + rb',}', data
    )

    # Wide (UTF-16LE) strings
    wide_strings = re.findall(
        rb'(?:[\x20-\x7e]\x00){' + str(min_length).encode() + rb',}', data
    )

    return {
        'ascii': [s.decode('ascii') for s in ascii_strings],
        'wide': [s.decode('utf-16-le') for s in wide_strings],
    }


def analyze_pe_imports(filepath):
    """Extract import table for API-based detection."""
    try:
        pe = pefile.PE(filepath)
    except pefile.PEFormatError:
        return []

    imports = []
    if hasattr(pe, 'DIRECTORY_ENTRY_IMPORT'):
        for entry in pe.DIRECTORY_ENTRY_IMPORT:
            dll_name = entry.dll.decode('utf-8', errors='replace')
            for imp in entry.imports:
                if imp.name:
                    func_name = imp.name.decode('utf-8', errors='replace')
                    imports.append(f"{dll_name}!{func_name}")
    return imports


def find_unique_byte_patterns(filepath, pattern_length=16):
    """Find unique byte sequences suitable for YARA hex patterns."""
    with open(filepath, 'rb') as f:
        data = f.read()

    try:
        pe = pefile.PE(filepath)
        # Focus on code section
        for section in pe.sections:
            if section.Characteristics & 0x20000000:  # IMAGE_SCN_MEM_EXECUTE
                code_start = section.PointerToRawData
                code_end = code_start + section.SizeOfRawData
                code_data = data[code_start:code_end]
                break
        else:
            code_data = data
    except Exception:
        code_data = data

    # Find byte patterns that appear exactly once
    patterns = []
    for i in range(0, len(code_data) - pattern_length, 4):
        pattern = code_data[i:i+pattern_length]
        if pattern.count(b'\x00') < pattern_length // 3:  # Skip null-heavy
            hex_pattern = ' '.join(f'{b:02X}' for b in pattern)
            patterns.append(hex_pattern)

    # Count frequency and return unique ones
    freq = Counter(patterns)
    unique = [p for p, count in freq.items() if count == 1]

    return unique[:20]  # Top 20 candidates


def suggest_rule_strings(filepath):
    """Suggest strings and patterns for YARA rule."""
    print(f"[+] Analyzing: {filepath}")

    # Extract strings
    strings = extract_strings(filepath)

    # Filter for suspicious/unique strings
    suspicious_keywords = [
        'http', 'https', 'cmd', 'powershell', 'mutex', 'pipe',
        'password', 'credential', 'inject', 'hook', 'debug',
        'sandbox', 'virtual', 'vmware', 'vbox',
    ]

    print("\n[+] Suspicious ASCII strings:")
    for s in strings['ascii']:
        if any(kw in s.lower() for kw in suspicious_keywords):
            print(f"  $ = \"{s}\" ascii")

    print("\n[+] Suspicious wide strings:")
    for s in strings['wide']:
        if any(kw in s.lower() for kw in suspicious_keywords):
            print(f"  $ = \"{s}\" wide")

    # Import analysis
    imports = analyze_pe_imports(filepath)
    suspicious_apis = [
        'VirtualAlloc', 'VirtualProtect', 'WriteProcessMemory',
        'CreateRemoteThread', 'NtUnmapViewOfSection', 'RtlMoveMemory',
        'OpenProcess', 'CreateToolhelp32Snapshot',
        'InternetOpenA', 'HttpSendRequestA',
        'CryptEncrypt', 'CryptDecrypt',
    ]

    print("\n[+] Suspicious imports:")
    for imp in imports:
        func = imp.split('!')[-1]
        if func in suspicious_apis:
            print(f"  {imp}")

    # Byte patterns
    print("\n[+] Candidate hex patterns:")
    patterns = find_unique_byte_patterns(filepath)
    for p in patterns[:5]:
        print(f"  $hex = {{ {p} }}")


if __name__ == "__main__":
    if len(sys.argv) < 2:
        print(f"Usage: {sys.argv[0]} <sample_path>")
        sys.exit(1)
    suggest_rule_strings(sys.argv[1])

Step 2: Write and Test YARA Rules

import yara
import os

def create_yara_rule(rule_name, meta, strings, condition):
    """Generate a YARA rule from components."""
    meta_str = "\n".join(f'        {k} = "{v}"' for k, v in meta.items())
    strings_str = "\n".join(f"        {s}" for s in strings)

    rule = f"""rule {rule_name} {{
    meta:
{meta_str}

    strings:
{strings_str}

    condition:
        {condition}
}}"""
    return rule


def test_yara_rule(rule_text, test_dir):
    """Compile and test YARA rule against sample directory."""
    try:
        rules = yara.compile(source=rule_text)
    except yara.SyntaxError as e:
        print(f"[-] YARA syntax error: {e}")
        return None

    results = {"matches": [], "no_match": []}

    for filename in os.listdir(test_dir):
        filepath = os.path.join(test_dir, filename)
        if not os.path.isfile(filepath):
            continue

        matches = rules.match(filepath)
        if matches:
            results["matches"].append({
                "file": filename,
                "rules": [m.rule for m in matches],
            })
        else:
            results["no_match"].append(filename)

    print(f"[+] Matches: {len(results['matches'])}")
    print(f"[-] No match: {len(results['no_match'])}")
    return results


# Example: Create a rule for a hypothetical malware family
example_rule = create_yara_rule(
    rule_name="MalwareFamily_Variant_A",
    meta={
        "description": "Detects MalwareFamily Variant A",
        "author": "Malware Analysis Team",
        "date": "2025-01-01",
        "hash": "abc123...",
        "tlp": "WHITE",
    },
    strings=[
        '$mutex = "Global\\\\UniqueM4lwareMutex" ascii wide',
        '$c2_pattern = /https?:\\/\\/[a-z]{5,10}\\.(xyz|top|buzz)\\/gate\\.php/',
        '$api1 = "VirtualAllocEx" ascii',
        '$api2 = "WriteProcessMemory" ascii',
        '$api3 = "CreateRemoteThread" ascii',
        '$hex_decrypt = { 8B 45 ?? 33 C1 89 45 ?? 83 C1 04 }',
        '$pdb = "C:\\\\Users\\\\" ascii',
    ],
    condition=(
        'uint16(0) == 0x5A4D and filesize < 2MB and '
        '($mutex or $c2_pattern) and '
        '2 of ($api*) and '
        '$hex_decrypt'
    ),
)

print(example_rule)

Step 3: Performance Testing and Optimization

import time

def benchmark_rule(rule_text, scan_directory, iterations=3):
    """Benchmark YARA rule scan performance."""
    rules = yara.compile(source=rule_text)

    files = []
    for root, _, filenames in os.walk(scan_directory):
        for f in filenames:
            files.append(os.path.join(root, f))

    print(f"[+] Benchmarking against {len(files)} files "
          f"({iterations} iterations)")

    times = []
    for i in range(iterations):
        start = time.perf_counter()
        matches = 0
        for filepath in files:
            try:
                result = rules.match(filepath)
                if result:
                    matches += 1
            except Exception:
                pass
        elapsed = time.perf_counter() - start
        times.append(elapsed)
        print(f"  Iteration {i+1}: {elapsed:.3f}s ({matches} matches)")

    avg_time = sum(times) / len(times)
    files_per_sec = len(files) / avg_time
    print(f"\n[+] Average: {avg_time:.3f}s ({files_per_sec:.0f} files/sec)")
    return avg_time

Validation Criteria

YARA rules compile without syntax errors
Rules detect target malware family samples with zero false negatives
False positive rate below 0.1% when scanned against clean file corpus
Rule performance allows scanning 1000+ files per second
Rules survive minor malware modifications (recompilation, string changes)
Metadata includes hash, author, date, description, and TLP marking

Compliance Framework Mapping

This skill supports compliance evidence collection across multiple frameworks:

SOC 2: CC7.2 (Anomaly Detection), CC7.4 (Incident Response)
ISO 27001: A.12.2 (Malware Protection), A.16.1 (Security Incident Management)
NIST 800-53: SI-3 (Malicious Code Protection), IR-4 (Incident Handling)
NIST CSF: DE.CM (Continuous Monitoring), RS.AN (Analysis)

Claw GRC Tip: When this skill is executed by a registered agent, compliance evidence is automatically captured and mapped to the relevant controls in your active frameworks.

Deploying This Skill with Claw GRC

Agent Execution

# Install via CLI
npx claw-grc skills add performing-yara-rule-development-for-detection

# Or load dynamically via MCP
grc.load_skill("performing-yara-rule-development-for-detection")

Audit Trail Integration

When executed through Claw GRC, every step of this skill generates tamper-evident audit records:

SHA-256 chain hashing ensures no step can be modified after execution
Evidence artifacts (configs, scan results, logs) are automatically attached to relevant controls
Trust score impact — successful execution increases your agent's trust score

Continuous Compliance

Schedule this skill for recurring execution to maintain continuous compliance posture. Claw GRC monitors for drift and alerts when re-execution is needed.

Prerequisites

Performing YARA Rule Development for Detection

Overview

Prerequisites

Key Concepts

Rule Structure

String Selection Strategy

Performance Optimization

Practical Steps

Step 1: Analyze Sample for Unique Patterns

Step 2: Write and Test YARA Rules

Step 3: Performance Testing and Optimization

Validation Criteria

Compliance Framework Mapping

Deploying This Skill with Claw GRC

Agent Execution

Audit Trail Integration

Continuous Compliance

References

Use with Claw GRC Agents

Tags

Related Skills

Performing Threat Hunting with YARA Rules

Analyzing Cobalt Strike Malleable C2 Profiles

Performing Malware Triage with YARA

Analyzing Cobalt Strike Beacon Configuration

Analyzing Android Malware with Apktool

Analyzing Macro Malware in Office Documents

Prerequisites

Performing YARA Rule Development for Detection

Overview

Prerequisites

Key Concepts

Rule Structure

String Selection Strategy

Performance Optimization

Practical Steps

Step 1: Analyze Sample for Unique Patterns

Step 2: Write and Test YARA Rules

Step 3: Performance Testing and Optimization

Validation Criteria

Compliance Framework Mapping

Deploying This Skill with Claw GRC

Agent Execution

Audit Trail Integration

Continuous Compliance

References

Use with Claw GRC Agents

Tags

Related Skills

Performing Threat Hunting with YARA Rules

Analyzing Cobalt Strike Malleable C2 Profiles

Performing Malware Triage with YARA

Analyzing Cobalt Strike Beacon Configuration

Analyzing Android Malware with Apktool

Analyzing Macro Malware in Office Documents