CG
SkillsPerforming Malware IOC Extraction
Start Free
Back to Skills Library
Threat Intelligence🟡 Intermediate

Performing Malware IOC Extraction

Leverage Malware IOC extraction — process of analyzing malicious software to identify actionable indicators of compromise including file hashes, network indicators (C2 domains, IP addresses, URLs), regist.

5 min read4 code examples

Prerequisites

  • Python 3.9+ with `pefile`, `yara-python`, `oletools`, `stix2` libraries
  • Access to malware analysis sandbox (Cuckoo, CAPE, Any.Run, Joe Sandbox)
  • VirusTotal API key for enrichment
  • Isolated analysis environment (VM or container)
  • Understanding of PE file format, common packing techniques
  • Familiarity with YARA rule syntax

Performing Malware IOC Extraction

Overview

Malware IOC extraction is the process of analyzing malicious software to identify actionable indicators of compromise including file hashes, network indicators (C2 domains, IP addresses, URLs), registry modifications, mutex names, embedded strings, and behavioral artifacts. This guide covers static analysis with PE parsing and string extraction, dynamic analysis with sandbox detonation, automated IOC extraction using tools like YARA, and formatting results as STIX 2.1 indicators for sharing.

Prerequisites

  • Python 3.9+ with pefile, yara-python, oletools, stix2 libraries
  • Access to malware analysis sandbox (Cuckoo, CAPE, Any.Run, Joe Sandbox)
  • VirusTotal API key for enrichment
  • Isolated analysis environment (VM or container)
  • Understanding of PE file format, common packing techniques
  • Familiarity with YARA rule syntax

Key Concepts

Static Analysis IOCs

  • File Hashes: MD5, SHA-1, SHA-256 of the sample and any dropped files
  • Import Hash (imphash): Hash of imported function table, groups malware families
  • Rich Header Hash: PE rich header hash for compiler fingerprinting
  • Strings: Embedded URLs, IP addresses, domain names, registry paths, mutex names
  • PE Metadata: Compilation timestamp, section names, resources, digital signatures
  • Embedded Artifacts: PDB paths, version info, certificate details

Dynamic Analysis IOCs

  • Network Activity: DNS queries, HTTP requests, TCP/UDP connections, SSL certificates
  • File System: Created/modified/deleted files and directories
  • Registry: Created/modified registry keys and values
  • Process: Spawned processes, injected processes, service creation
  • Behavioral: API calls, mutex creation, scheduled tasks, persistence mechanisms

YARA Rules

YARA is a pattern-matching tool for identifying and classifying malware. Rules consist of strings (text, hex, regex) and conditions that define matching logic. Rules can detect malware families, packers, exploit kits, and specific campaign tools.

Practical Steps

Step 1: Static Analysis - PE Parsing and Hash Generation

import pefile
import hashlib
import os

def analyze_pe(filepath):
    """Extract IOCs from a PE file through static analysis."""
    iocs = {"hashes": {}, "pe_info": {}, "strings": [], "imports": []}

    # Calculate file hashes
    with open(filepath, "rb") as f:
        data = f.read()
    iocs["hashes"]["md5"] = hashlib.md5(data).hexdigest()
    iocs["hashes"]["sha1"] = hashlib.sha1(data).hexdigest()
    iocs["hashes"]["sha256"] = hashlib.sha256(data).hexdigest()
    iocs["hashes"]["file_size"] = len(data)

    # Parse PE headers
    try:
        pe = pefile.PE(filepath)
        iocs["hashes"]["imphash"] = pe.get_imphash()
        iocs["pe_info"]["compilation_time"] = str(pe.FILE_HEADER.TimeDateStamp)
        iocs["pe_info"]["machine_type"] = hex(pe.FILE_HEADER.Machine)
        iocs["pe_info"]["subsystem"] = pe.OPTIONAL_HEADER.Subsystem

        # Extract sections
        iocs["pe_info"]["sections"] = []
        for section in pe.sections:
            iocs["pe_info"]["sections"].append({
                "name": section.Name.decode("utf-8", errors="ignore").strip("\x00"),
                "virtual_size": section.Misc_VirtualSize,
                "raw_size": section.SizeOfRawData,
                "entropy": section.get_entropy(),
                "md5": section.get_hash_md5(),
            })

        # Extract imports
        if hasattr(pe, "DIRECTORY_ENTRY_IMPORT"):
            for entry in pe.DIRECTORY_ENTRY_IMPORT:
                dll_name = entry.dll.decode("utf-8", errors="ignore")
                functions = [
                    imp.name.decode("utf-8", errors="ignore")
                    for imp in entry.imports
                    if imp.name
                ]
                iocs["imports"].append({"dll": dll_name, "functions": functions})

        # Check for suspicious characteristics
        iocs["pe_info"]["is_dll"] = pe.is_dll()
        iocs["pe_info"]["is_driver"] = pe.is_driver()
        iocs["pe_info"]["is_exe"] = pe.is_exe()

        # Version info
        if hasattr(pe, "VS_VERSIONINFO"):
            for entry in pe.FileInfo:
                for st in entry:
                    for item in st.entries.items():
                        key = item[0].decode("utf-8", errors="ignore")
                        val = item[1].decode("utf-8", errors="ignore")
                        iocs["pe_info"][f"version_{key}"] = val

        pe.close()

    except pefile.PEFormatError as e:
        iocs["pe_info"]["error"] = str(e)

    return iocs

Step 2: String Extraction and IOC Pattern Matching

import re

def extract_ioc_strings(filepath):
    """Extract IOC-relevant strings from binary file."""
    patterns = {
        "ipv4": re.compile(
            r"\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}"
            r"(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b"
        ),
        "domain": re.compile(
            r"\b(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+"
            r"(?:com|net|org|io|ru|cn|tk|xyz|top|info|biz|cc|ws|pw)\b"
        ),
        "url": re.compile(
            r"https?://[^\s\"'<>]{5,200}"
        ),
        "email": re.compile(
            r"\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b"
        ),
        "registry": re.compile(
            r"(?:HKEY_[A-Z_]+|HKLM|HKCU|HKU|HKCR|HKCC)"
            r"\\[\\a-zA-Z0-9_ .{}-]+"
        ),
        "filepath_windows": re.compile(
            r"[A-Z]:\\(?:[^\\/:*?\"<>|\r\n]+\\)*[^\\/:*?\"<>|\r\n]+"
        ),
        "mutex": re.compile(
            r"(?:Global\\|Local\\)[a-zA-Z0-9_\-{}.]{4,}"
        ),
        "useragent": re.compile(
            r"Mozilla/[45]\.0[^\"']{10,200}"
        ),
        "bitcoin": re.compile(
            r"\b[13][a-km-zA-HJ-NP-Z1-9]{25,34}\b"
        ),
        "pdb_path": re.compile(
            r"[A-Z]:\\[^\"]{5,200}\.pdb"
        ),
    }

    with open(filepath, "rb") as f:
        data = f.read()

    # Extract ASCII strings (min length 4)
    ascii_strings = re.findall(rb"[\x20-\x7e]{4,}", data)
    # Extract Unicode strings
    unicode_strings = re.findall(
        rb"(?:[\x20-\x7e]\x00){4,}", data
    )

    all_strings = [s.decode("ascii", errors="ignore") for s in ascii_strings]
    all_strings += [
        s.decode("utf-16-le", errors="ignore") for s in unicode_strings
    ]

    extracted = {category: set() for category in patterns}

    for string in all_strings:
        for category, pattern in patterns.items():
            matches = pattern.findall(string)
            for match in matches:
                extracted[category].add(match)

    # Convert sets to sorted lists
    return {k: sorted(v) for k, v in extracted.items() if v}

Step 3: YARA Rule Scanning

import yara

def scan_with_yara(filepath, rules_path):
    """Scan file with YARA rules for malware classification."""
    rules = yara.compile(filepath=rules_path)
    matches = rules.match(filepath)

    results = []
    for match in matches:
        result = {
            "rule": match.rule,
            "namespace": match.namespace,
            "tags": match.tags,
            "meta": match.meta,
            "strings": [],
        }
        for offset, identifier, data in match.strings:
            result["strings"].append({
                "offset": hex(offset),
                "identifier": identifier,
                "data": data.hex() if len(data) < 100 else data[:100].hex() + "...",
            })
        results.append(result)

    return results


# Example YARA rule for common malware indicators
SAMPLE_YARA_RULE = """
rule Suspicious_Network_Indicators {
    meta:
        description = "Detects suspicious network-related strings"
        author = "CTI Analyst"
        severity = "medium"
    strings:
        $ua1 = "Mozilla/5.0" ascii
        $cmd1 = "cmd.exe /c" ascii nocase
        $ps1 = "powershell" ascii nocase
        $wget = "wget" ascii nocase
        $curl = "curl" ascii nocase
        $b64 = "base64" ascii nocase
        $reg1 = "HKLM\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Run" ascii nocase
    condition:
        uint16(0) == 0x5A4D and
        (2 of ($ua1, $cmd1, $ps1, $wget, $curl, $b64)) or $reg1
}

rule Packed_Binary {
    meta:
        description = "Detects potentially packed binary"
        author = "CTI Analyst"
    condition:
        uint16(0) == 0x5A4D and
        for any section in pe.sections : (
            section.entropy >= 7.0
        )
}
"""

Step 4: Generate STIX 2.1 Indicators

from stix2 import (
    Bundle, Indicator, Malware, Relationship,
    File as STIXFile, DomainName, IPv4Address,
    ObservedData,
)
from datetime import datetime

def create_stix_bundle(pe_iocs, string_iocs, yara_results, sample_name):
    """Create STIX 2.1 bundle from extracted IOCs."""
    objects = []

    # Create Malware SDO
    malware = Malware(
        name=sample_name,
        is_family=False,
        malware_types=["unknown"],
        description=f"Malware sample analyzed: {pe_iocs['hashes']['sha256']}",
        allow_custom=True,
    )
    objects.append(malware)

    # File hash indicator
    sha256 = pe_iocs["hashes"]["sha256"]
    hash_indicator = Indicator(
        name=f"Malware hash: {sha256[:16]}...",
        pattern=f"[file:hashes.'SHA-256' = '{sha256}']",
        pattern_type="stix",
        valid_from=datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ"),
        indicator_types=["malicious-activity"],
        allow_custom=True,
    )
    objects.append(hash_indicator)
    objects.append(Relationship(
        relationship_type="indicates",
        source_ref=hash_indicator.id,
        target_ref=malware.id,
    ))

    # Network indicators from strings
    for ip in string_iocs.get("ipv4", []):
        if not ip.startswith(("10.", "172.", "192.168.", "127.")):
            ip_indicator = Indicator(
                name=f"C2 IP: {ip}",
                pattern=f"[ipv4-addr:value = '{ip}']",
                pattern_type="stix",
                valid_from=datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ"),
                indicator_types=["malicious-activity"],
                allow_custom=True,
            )
            objects.append(ip_indicator)
            objects.append(Relationship(
                relationship_type="indicates",
                source_ref=ip_indicator.id,
                target_ref=malware.id,
            ))

    for domain in string_iocs.get("domain", []):
        domain_indicator = Indicator(
            name=f"C2 Domain: {domain}",
            pattern=f"[domain-name:value = '{domain}']",
            pattern_type="stix",
            valid_from=datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ"),
            indicator_types=["malicious-activity"],
            allow_custom=True,
        )
        objects.append(domain_indicator)
        objects.append(Relationship(
            relationship_type="indicates",
            source_ref=domain_indicator.id,
            target_ref=malware.id,
        ))

    bundle = Bundle(objects=objects, allow_custom=True)
    return bundle

Validation Criteria

  • PE file parsed successfully with hashes, imports, and section analysis
  • String extraction identifies network IOCs (IPs, domains, URLs)
  • YARA rules match against known malware characteristics
  • STIX 2.1 bundle contains valid Indicator and Malware objects
  • Private IP ranges and benign strings filtered from IOC output
  • IOCs are actionable for blocking and detection rule creation

Compliance Framework Mapping

This skill supports compliance evidence collection across multiple frameworks:

  • SOC 2: CC7.1 (Monitoring), CC7.2 (Anomaly Detection)
  • ISO 27001: A.6.1 (Threat Intelligence), A.16.1 (Security Incident Management)
  • NIST 800-53: PM-16 (Threat Awareness), RA-3 (Risk Assessment), SI-5 (Security Alerts)
  • NIST CSF: ID.RA (Risk Assessment), DE.AE (Anomalies & Events)

Claw GRC Tip: When this skill is executed by a registered agent, compliance evidence is automatically captured and mapped to the relevant controls in your active frameworks.

Deploying This Skill with Claw GRC

Agent Execution

Register this skill with your Claw GRC agent for automated execution:

# Install via CLI
npx claw-grc skills add performing-malware-ioc-extraction

# Or load dynamically via MCP
grc.load_skill("performing-malware-ioc-extraction")

Audit Trail Integration

When executed through Claw GRC, every step of this skill generates tamper-evident audit records:

  • SHA-256 chain hashing ensures no step can be modified after execution
  • Evidence artifacts (configs, scan results, logs) are automatically attached to relevant controls
  • Trust score impact — successful execution increases your agent's trust score

Continuous Compliance

Schedule this skill for recurring execution to maintain continuous compliance posture. Claw GRC monitors for drift and alerts when re-execution is needed.

References

  • pefile Documentation
  • YARA Documentation
  • MITRE ATT&CK Software
  • VirusTotal API
  • CAPE Sandbox
  • MalwareBazaar

Use with Claw GRC Agents

This skill is fully compatible with Claw GRC's autonomous agent system. Deploy it to any registered agent via MCP, and every execution will be logged in the tamper-evident audit trail.

// Load this skill in your agent
npx claw-grc skills add performing-malware-ioc-extraction
// Or via MCP
grc.load_skill("performing-malware-ioc-extraction")

Tags

threat-intelligencectiiocmitre-attackstixmalware-analysisyarareverse-engineering

Related Skills

Threat Intelligence

Analyzing Campaign Attribution Evidence

3m·intermediate
Threat Intelligence

Analyzing Threat Actor TTPS with MITRE ATT&CK

4m·intermediate
Threat Intelligence

Building IOC Enrichment Pipeline with Opencti

3m·intermediate
Threat Intelligence

Building Threat Intelligence Platform

4m·intermediate
Threat Intelligence

Collecting Threat Intelligence with Misp

3m·intermediate
Threat Intelligence

Implementing STIX Taxii Feed Integration

4m·intermediate

Skill Details

Domain
Threat Intelligence
Difficulty
intermediate
Read Time
5 min
Code Examples
4

On This Page

OverviewPrerequisitesKey ConceptsPractical StepsValidation CriteriaReferencesCompliance Framework MappingDeploying This Skill with Claw GRC

Deploy This Skill

Add this skill to your Claw GRC agent and start automating.

Get Started Free →