CG
SkillsPerforming Malware Hash Enrichment with VirusTotal
Start Free
Back to Skills Library
Threat Intelligence🟡 Intermediate

Performing Malware Hash Enrichment with VirusTotal

Enrich malware file hashes using the VirusTotal API to retrieve detection rates, behavioral analysis, YARA matches, and contextual threat intelligence for incident triage and IOC validation.

5 min read5 code examples

Prerequisites

  • Python 3.9+ with `vt-py` (official VirusTotal Python client) or `requests`
  • VirusTotal API key (free tier: 4 requests/minute, 500/day; premium for higher limits)
  • Understanding of file hash types: MD5, SHA-1, SHA-256
  • Familiarity with AV detection naming conventions
  • STIX 2.1 knowledge for IOC representation

Performing Malware Hash Enrichment with VirusTotal

Overview

VirusTotal is the world's largest crowdsourced malware corpus, scanning files with 70+ antivirus engines and providing behavioral analysis, YARA rule matches, network indicators, and community intelligence. This guide covers using the VirusTotal API v3 to enrich file hashes (MD5, SHA-1, SHA-256) with detection verdicts, sandbox reports, related indicators, and contextual intelligence for SOC triage, incident response, and threat intelligence enrichment workflows.

Prerequisites

  • Python 3.9+ with vt-py (official VirusTotal Python client) or requests
  • VirusTotal API key (free tier: 4 requests/minute, 500/day; premium for higher limits)
  • Understanding of file hash types: MD5, SHA-1, SHA-256
  • Familiarity with AV detection naming conventions
  • STIX 2.1 knowledge for IOC representation

Key Concepts

VirusTotal API v3

The API provides RESTful endpoints for file reports (/files/{hash}), URL scanning, domain reports, IP address intelligence, and advanced hunting with VirusTotal Intelligence (VTI). Each file report includes detection results from 70+ AV engines, behavioral analysis from sandboxes, YARA rule matches, sigma rule matches, file metadata (PE headers, imports, sections), network indicators (contacted IPs, domains, URLs), and community votes and comments.

Hash Enrichment Workflow

The typical enrichment flow is: receive hash from alert/EDR -> query VT API -> parse detection ratio -> extract behavioral indicators -> correlate with existing intelligence -> make triage decision. The API returns a last_analysis_stats object with malicious, suspicious, undetected, and harmless counts.

Pivoting from Hashes

VirusTotal enables pivoting from a single hash to related intelligence: similar files (ITW/in-the-wild samples), contacted domains and IPs (C2 infrastructure), dropped files, embedded URLs, YARA rule matches, and threat actor attribution through crowdsourced intelligence.

Practical Steps

Step 1: Query VirusTotal for Hash Report

import vt
import json
import hashlib
from datetime import datetime

class VTEnricher:
    def __init__(self, api_key):
        self.client = vt.Client(api_key)

    def enrich_hash(self, file_hash):
        """Enrich a file hash with VirusTotal intelligence."""
        try:
            file_obj = self.client.get_object(f"/files/{file_hash}")
            stats = file_obj.last_analysis_stats
            report = {
                "hash": file_hash,
                "sha256": file_obj.sha256,
                "sha1": file_obj.sha1,
                "md5": file_obj.md5,
                "file_type": getattr(file_obj, "type_description", "Unknown"),
                "file_size": getattr(file_obj, "size", 0),
                "first_submission": str(getattr(file_obj, "first_submission_date", "")),
                "last_analysis_date": str(getattr(file_obj, "last_analysis_date", "")),
                "detection_stats": {
                    "malicious": stats.get("malicious", 0),
                    "suspicious": stats.get("suspicious", 0),
                    "undetected": stats.get("undetected", 0),
                    "harmless": stats.get("harmless", 0),
                },
                "detection_ratio": f"{stats.get('malicious', 0)}/{sum(stats.values())}",
                "popular_threat_names": getattr(file_obj, "popular_threat_classification", {}),
                "tags": getattr(file_obj, "tags", []),
                "names": getattr(file_obj, "names", []),
            }
            total_engines = sum(stats.values())
            mal_count = stats.get("malicious", 0)
            report["threat_level"] = (
                "critical" if mal_count > total_engines * 0.7
                else "high" if mal_count > total_engines * 0.4
                else "medium" if mal_count > total_engines * 0.1
                else "low" if mal_count > 0
                else "clean"
            )
            print(f"[+] {file_hash[:16]}... -> {report['detection_ratio']} "
                  f"({report['threat_level'].upper()})")
            return report
        except vt.error.APIError as e:
            print(f"[-] VT API error for {file_hash}: {e}")
            return None

    def get_behavior_report(self, file_hash):
        """Get sandbox behavioral analysis for a file."""
        try:
            behaviors = self.client.get_object(f"/files/{file_hash}/behaviours")
            behavior_data = {
                "processes_created": [],
                "files_written": [],
                "registry_keys_set": [],
                "dns_lookups": [],
                "http_conversations": [],
                "mutexes_created": [],
                "commands_executed": [],
            }
            for sandbox in getattr(behaviors, "data", []):
                attrs = sandbox.get("attributes", {})
                behavior_data["processes_created"].extend(
                    attrs.get("processes_created", []))
                behavior_data["files_written"].extend(
                    [f.get("path", "") for f in attrs.get("files_written", [])])
                behavior_data["registry_keys_set"].extend(
                    [r.get("key", "") for r in attrs.get("registry_keys_set", [])])
                behavior_data["dns_lookups"].extend(
                    [d.get("hostname", "") for d in attrs.get("dns_lookups", [])])
                behavior_data["commands_executed"].extend(
                    attrs.get("command_executions", []))
            return behavior_data
        except Exception as e:
            print(f"[-] Behavior report error: {e}")
            return {}

    def close(self):
        self.client.close()

# Usage
enricher = VTEnricher("YOUR_VT_API_KEY")
report = enricher.enrich_hash("275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f")
print(json.dumps(report, indent=2, default=str))
enricher.close()

Step 2: Batch Hash Enrichment with Rate Limiting

import time
import csv

def batch_enrich(api_key, hash_file, output_file, rate_limit=4):
    """Enrich a list of hashes from a file with rate limiting."""
    enricher = VTEnricher(api_key)
    results = []

    with open(hash_file, "r") as f:
        hashes = [line.strip() for line in f if line.strip()]

    print(f"[*] Enriching {len(hashes)} hashes (rate: {rate_limit}/min)")
    for i, file_hash in enumerate(hashes):
        report = enricher.enrich_hash(file_hash)
        if report:
            results.append(report)
        if (i + 1) % rate_limit == 0:
            print(f"  [{i+1}/{len(hashes)}] Rate limit pause (60s)...")
            time.sleep(60)

    # Export to CSV
    with open(output_file, "w", newline="") as f:
        if results:
            writer = csv.DictWriter(f, fieldnames=results[0].keys())
            writer.writeheader()
            for r in results:
                flat = {k: str(v) for k, v in r.items()}
                writer.writerow(flat)

    print(f"[+] Enrichment complete: {len(results)}/{len(hashes)} hashes")
    print(f"[+] Results saved to {output_file}")
    enricher.close()
    return results

batch_enrich("YOUR_API_KEY", "hashes.txt", "enrichment_results.csv")

Step 3: Extract Network Indicators for Pivoting

def extract_network_iocs(api_key, file_hash):
    """Extract network-based IOCs from VT for C2 identification."""
    client = vt.Client(api_key)
    network_iocs = {
        "contacted_ips": [],
        "contacted_domains": [],
        "contacted_urls": [],
        "embedded_urls": [],
    }

    try:
        # Get contacted IPs
        it = client.iterator(f"/files/{file_hash}/contacted_ips")
        for ip_obj in it:
            network_iocs["contacted_ips"].append({
                "ip": ip_obj.id,
                "country": getattr(ip_obj, "country", ""),
                "asn": getattr(ip_obj, "asn", 0),
                "as_owner": getattr(ip_obj, "as_owner", ""),
            })

        # Get contacted domains
        it = client.iterator(f"/files/{file_hash}/contacted_domains")
        for domain_obj in it:
            network_iocs["contacted_domains"].append({
                "domain": domain_obj.id,
                "registrar": getattr(domain_obj, "registrar", ""),
                "creation_date": str(getattr(domain_obj, "creation_date", "")),
            })

        # Get contacted URLs
        it = client.iterator(f"/files/{file_hash}/contacted_urls")
        for url_obj in it:
            network_iocs["contacted_urls"].append({
                "url": url_obj.url,
                "last_http_response_code": getattr(url_obj, "last_http_response_content_length", 0),
            })

    except Exception as e:
        print(f"[-] Error extracting network IOCs: {e}")
    finally:
        client.close()

    print(f"[+] Network IOCs: {len(network_iocs['contacted_ips'])} IPs, "
          f"{len(network_iocs['contacted_domains'])} domains, "
          f"{len(network_iocs['contacted_urls'])} URLs")
    return network_iocs

Step 4: YARA Rule Matching and Threat Classification

def get_yara_matches(api_key, file_hash):
    """Retrieve YARA rule matches for threat classification."""
    client = vt.Client(api_key)
    try:
        file_obj = client.get_object(f"/files/{file_hash}")
        crowdsourced_yara = getattr(file_obj, "crowdsourced_yara_results", [])

        matches = []
        for rule in crowdsourced_yara:
            matches.append({
                "rule_name": rule.get("rule_name", ""),
                "ruleset_name": rule.get("ruleset_name", ""),
                "author": rule.get("author", ""),
                "description": rule.get("description", ""),
                "source": rule.get("source", ""),
            })

        # Classify based on YARA matches
        classifications = set()
        for m in matches:
            rule_lower = m["rule_name"].lower()
            if any(k in rule_lower for k in ["apt", "nation", "state"]):
                classifications.add("apt")
            if any(k in rule_lower for k in ["ransom", "crypto"]):
                classifications.add("ransomware")
            if any(k in rule_lower for k in ["trojan", "rat", "backdoor"]):
                classifications.add("trojan")
            if any(k in rule_lower for k in ["loader", "dropper"]):
                classifications.add("loader")

        print(f"[+] YARA: {len(matches)} rules matched")
        print(f"[+] Classifications: {classifications or {'unclassified'}}")
        return {"matches": matches, "classifications": list(classifications)}
    finally:
        client.close()

Step 5: Generate Enrichment Report

def generate_enrichment_report(hash_report, behavior, network, yara_data):
    """Generate comprehensive enrichment report."""
    report = {
        "metadata": {
            "generated": datetime.now().isoformat(),
            "hash": hash_report.get("sha256", ""),
        },
        "verdict": {
            "threat_level": hash_report.get("threat_level", "unknown"),
            "detection_ratio": hash_report.get("detection_ratio", "0/0"),
            "classifications": yara_data.get("classifications", []),
            "threat_names": hash_report.get("popular_threat_names", {}),
        },
        "behavioral_indicators": {
            "processes": behavior.get("processes_created", [])[:10],
            "dns_queries": behavior.get("dns_lookups", [])[:10],
            "commands": behavior.get("commands_executed", [])[:10],
        },
        "network_indicators": {
            "c2_candidates": network.get("contacted_ips", [])[:10],
            "domains": network.get("contacted_domains", [])[:10],
        },
        "yara_matches": yara_data.get("matches", [])[:10],
        "recommendation": (
            "BLOCK and investigate" if hash_report.get("threat_level") in ("critical", "high")
            else "Monitor and analyze" if hash_report.get("threat_level") == "medium"
            else "Low risk - continue monitoring"
        ),
    }

    with open(f"enrichment_{hash_report.get('sha256', 'unknown')[:16]}.json", "w") as f:
        json.dump(report, f, indent=2, default=str)
    return report

Validation Criteria

  • VT API v3 queried successfully with proper authentication
  • File hash enriched with detection stats, behavioral data, and network indicators
  • Batch enrichment handles rate limiting correctly
  • Network IOCs extracted for C2 identification
  • YARA matches retrieved and used for classification
  • Enrichment report generated with actionable verdict

Compliance Framework Mapping

This skill supports compliance evidence collection across multiple frameworks:

  • SOC 2: CC7.1 (Monitoring), CC7.2 (Anomaly Detection)
  • ISO 27001: A.6.1 (Threat Intelligence), A.16.1 (Security Incident Management)
  • NIST 800-53: PM-16 (Threat Awareness), RA-3 (Risk Assessment), SI-5 (Security Alerts)
  • NIST CSF: ID.RA (Risk Assessment), DE.AE (Anomalies & Events)

Claw GRC Tip: When this skill is executed by a registered agent, compliance evidence is automatically captured and mapped to the relevant controls in your active frameworks.

Deploying This Skill with Claw GRC

Agent Execution

Register this skill with your Claw GRC agent for automated execution:

# Install via CLI
npx claw-grc skills add performing-malware-hash-enrichment-with-virustotal

# Or load dynamically via MCP
grc.load_skill("performing-malware-hash-enrichment-with-virustotal")

Audit Trail Integration

When executed through Claw GRC, every step of this skill generates tamper-evident audit records:

  • SHA-256 chain hashing ensures no step can be modified after execution
  • Evidence artifacts (configs, scan results, logs) are automatically attached to relevant controls
  • Trust score impact — successful execution increases your agent's trust score

Continuous Compliance

Schedule this skill for recurring execution to maintain continuous compliance posture. Claw GRC monitors for drift and alerts when re-execution is needed.

References

  • VirusTotal API v3 Documentation
  • vt-py Official Python Client
  • VirusTotal Intelligence
  • Torq: VT Hash Enrichment Workflow
  • Dynatrace: Enrich Observables with VT
  • Penligent: VT in Incident Response

Use with Claw GRC Agents

This skill is fully compatible with Claw GRC's autonomous agent system. Deploy it to any registered agent via MCP, and every execution will be logged in the tamper-evident audit trail.

// Load this skill in your agent
npx claw-grc skills add performing-malware-hash-enrichment-with-virustotal
// Or via MCP
grc.load_skill("performing-malware-hash-enrichment-with-virustotal")

Tags

virustotalmalware-analysishash-enrichmentiocthreat-intelligencetriageapidetection

Related Skills

Threat Intelligence

Building IOC Enrichment Pipeline with Opencti

3m·intermediate
Threat Intelligence

Performing Malware IOC Extraction

5m·intermediate
Threat Intelligence

Analyzing Campaign Attribution Evidence

3m·intermediate
Threat Intelligence

Analyzing Threat Actor TTPS with MITRE ATT&CK

4m·intermediate
Threat Intelligence

Building IOC Defanging and Sharing Pipeline

5m·intermediate
Threat Intelligence

Building Threat Intelligence Platform

4m·intermediate

Skill Details

Domain
Threat Intelligence
Difficulty
intermediate
Read Time
5 min
Code Examples
5

On This Page

OverviewPrerequisitesKey ConceptsPractical StepsValidation CriteriaReferencesCompliance Framework MappingDeploying This Skill with Claw GRC

Deploy This Skill

Add this skill to your Claw GRC agent and start automating.

Get Started Free →