Reverse Engineering Malware with Ghidra
When to Use
- Static and dynamic analysis have identified suspicious functionality that requires deeper code-level understanding
- You need to reverse engineer C2 communication protocols, encryption algorithms, or custom obfuscation
- Understanding the exact exploit mechanism or vulnerability targeted by a malware sample
- Extracting hardcoded configuration data (C2 addresses, encryption keys, campaign IDs) embedded in compiled code
- Developing precise YARA rules or detection signatures based on unique code patterns
Do not use for initial triage of unknown samples; perform static analysis with PEStudio and behavioral analysis with Cuckoo first.
Prerequisites
- Ghidra 11.x installed (download from https://ghidra-sre.org/) with JDK 17+
- Analysis VM isolated from production network (Windows or Linux host)
- Familiarity with x86/x64 assembly language and Windows API conventions
- PDB symbol files for Windows system DLLs to improve decompilation accuracy
- Ghidra scripts repository (ghidra_scripts) for automated analysis tasks
- Secondary reference: IDA Free or Binary Ninja for cross-validation of analysis results
Workflow
Step 1: Create Project and Import Binary
Set up a Ghidra project and import the malware sample:
1. Launch Ghidra: ghidraRun (Linux) or ghidraRun.bat (Windows)
2. File -> New Project -> Non-Shared Project -> Select directory
3. File -> Import File -> Select malware binary
4. Ghidra auto-detects format (PE, ELF, Mach-O) and architecture
5. Accept default import options (or specify base address if known)
6. Double-click imported file to open in CodeBrowser
7. When prompted, run Auto Analysis with default analyzers enabled
Headless analysis for automation:
# Run Ghidra headless analysis with decompiler
/opt/ghidra/support/analyzeHeadless /tmp/ghidra_project MalwareProject \
-import suspect.exe \
-postScript ExportDecompilation.py \
-scriptPath /opt/ghidra/scripts/ \
-deleteProject
Step 2: Identify Key Functions and Entry Points
Navigate the binary to locate critical code sections:
Navigation Strategy:
โโโโโโโโโโโโโโโโโโโ
1. Start at entry point (OEP) - follow execution from _start/WinMain
2. Check Symbol Tree for imported functions (Window -> Symbol Tree)
3. Search for cross-references to suspicious APIs:
- VirtualAlloc/VirtualAllocEx (memory allocation for injection)
- CreateRemoteThread (remote thread injection)
- CryptEncrypt/CryptDecrypt (encryption operations)
- InternetOpen/HttpSendRequest (C2 communication)
- RegSetValueEx (persistence via registry)
4. Use Search -> For Strings to find embedded URLs, IPs, and paths
5. Check the Functions window sorted by size (large functions often contain core logic)
Ghidra keyboard shortcuts for efficient navigation:
G - Go to address
Ctrl+E - Search for strings
X - Show cross-references to current location
Ctrl+Shift+F - Search memory for byte patterns
L - Rename label/function
; - Add comment
T - Retype variable
Ctrl+L - Retype return value
Step 3: Analyze Decompiled Code
Use Ghidra's decompiler to understand function logic:
// Example: Ghidra decompiler output for a decryption routine
// Analyst renames variables and adds types for clarity
void decrypt_config(BYTE *encrypted_data, int data_len, BYTE *key, int key_len) {
// XOR decryption with rolling key
for (int i = 0; i < data_len; i++) {
encrypted_data[i] = encrypted_data[i] ^ key[i % key_len];
}
return;
}
// Analyst actions in Ghidra:
// 1. Right-click parameters -> Retype to correct types (BYTE*, int)
// 2. Right-click variables -> Rename to meaningful names
// 3. Add comments explaining the algorithm
// 4. Set function signature to propagate types to callers
Step 4: Trace C2 Communication Logic
Follow the network communication code path:
Analysis Steps for C2 Protocol Reverse Engineering:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1. Find InternetOpenA/WinHttpOpen call -> trace to wrapper function
2. Follow data flow from encrypted config -> URL construction
3. Identify HTTP method (GET/POST), headers, and body format
4. Locate response parsing logic (JSON parsing, custom binary protocol)
5. Map the C2 command dispatcher (switch/case or jump table)
6. Document the command set (download, execute, exfiltrate, update, uninstall)
Ghidra Script for extracting C2 configuration:
# Ghidra Python script: extract_c2_config.py
# Run via Script Manager in Ghidra
from ghidra.program.model.data import StringDataType
from ghidra.program.model.symbol import SourceType
# Search for XOR decryption patterns
listing = currentProgram.getListing()
memory = currentProgram.getMemory()
# Find references to InternetOpenA
symbol_table = currentProgram.getSymbolTable()
for symbol in symbol_table.getExternalSymbols():
if "InternetOpen" in symbol.getName():
refs = getReferencesTo(symbol.getAddress())
for ref in refs:
print("C2 init at: {}".format(ref.getFromAddress()))
Step 5: Analyze Encryption and Obfuscation
Identify and document cryptographic routines:
Common Malware Encryption Patterns:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
XOR Cipher: Loop with XOR operation, often single-byte or rolling key
RC4: Two loops (KSA + PRGA), 256-byte S-box initialization
AES: Look for S-box constants (0x63, 0x7C, 0x77...) or calls to CryptEncrypt
Base64: Lookup table with A-Za-z0-9+/= characters
Custom: Combination of arithmetic operations (ADD, SUB, ROL, ROR with XOR)
Identification Tips:
- Search for constants: AES S-box, CRC32 table, MD5 init values
- Look for loop structures operating on byte arrays
- Check for Windows Crypto API usage (CryptAcquireContext -> CryptCreateHash -> CryptEncrypt)
- FindCrypt Ghidra plugin automatically identifies crypto constants
Step 6: Document Findings and Create Detection Signatures
Produce actionable intelligence from reverse engineering:
# Generate YARA rule from unique code patterns found in Ghidra
cat << 'EOF' > malware_family_x.yar
rule MalwareFamilyX_Decryptor {
meta:
description = "Detects MalwareX decryption routine"
author = "analyst"
date = "2025-09-15"
strings:
// XOR decryption loop with hardcoded key
$decrypt = { 8A 04 0E 32 04 0F 88 04 0E 41 3B CA 7C F3 }
// C2 URL pattern after decryption
$c2_pattern = "/gate.php?id=" ascii
condition:
uint16(0) == 0x5A4D and $decrypt and $c2_pattern
}
EOF
Key Concepts
| Term | Definition |
|---|---|
| Disassembly | Converting machine code bytes into human-readable assembly language instructions; Ghidra's Listing view shows disassembled code |
| Decompilation | Lifting assembly code to pseudo-C representation for easier analysis; Ghidra's Decompile window provides this view |
| Cross-Reference (XREF) | Reference showing where a function or data address is called from or used; essential for tracing code execution flow |
| Control Flow Graph (CFG) | Visual representation of all possible execution paths through a function; reveals branching logic and loops |
| Original Entry Point (OEP) | The actual start address of the malware code after unpacking; packers redirect execution through an unpacking stub first |
| Function Signature | The return type, name, and parameter types of a function; applying correct signatures improves decompiler output quality |
| Ghidra Script | Python or Java automation script executed within Ghidra to perform batch analysis, pattern searching, or data extraction |
Tools & Systems
- Ghidra: NSA's open-source software reverse engineering suite with disassembler, decompiler, and scripting support for multiple architectures
- IDA Pro/Free: Industry-standard interactive disassembler; IDA Free provides x86/x64 cloud-based decompilation
- Binary Ninja: Commercial reverse engineering platform with modern UI and extensive API for plugin development
- x64dbg: Open-source x64/x32 debugger for Windows used alongside Ghidra for dynamic debugging of malware
- FindCrypt (Ghidra Plugin): Plugin that identifies cryptographic constants and algorithms in binary code
Common Scenarios
Scenario: Reversing Custom C2 Protocol
Context: Behavioral analysis shows encrypted traffic to an external IP on a non-standard port. Network signatures cannot detect variants because the protocol is proprietary. Deep reverse engineering is needed to understand the protocol structure.
Approach:
- Import the unpacked sample into Ghidra and run full auto-analysis
- Locate socket/WinHTTP API calls and trace backwards to the calling function
- Identify the encryption routine called before data is sent (follow data flow from send/HttpSendRequest)
- Reverse the encryption (XOR key extraction, RC4 key derivation, AES key location)
- Map the command structure by analyzing the response parsing function (switch/case on command IDs)
- Document the protocol format (header structure, command bytes, encryption method)
- Create a protocol decoder script for network monitoring tools
Pitfalls:
- Not running the full auto-analysis before starting manual analysis (missing function boundaries and type propagation)
- Ignoring indirect calls through function pointers or vtables (use cross-references to data holding function addresses)
- Spending time on library code that Ghidra's Function ID (FID) or FLIRT signatures should have identified
- Not saving Ghidra project progress frequently (analysis state can be lost on crashes)
Output Format
REVERSE ENGINEERING ANALYSIS REPORT
=====================================
Sample: unpacked_payload.exe
SHA-256: abc123def456...
Architecture: x86 (32-bit PE)
Ghidra Project: MalwareX_Analysis
FUNCTION MAP
0x00401000 main() - Entry point, initializes config
0x00401200 decrypt_config() - XOR decryption with 16-byte key
0x00401400 init_c2() - WinHTTP initialization, URL construction
0x00401800 c2_beacon() - HTTP POST beacon with system info
0x00401C00 cmd_dispatcher() - Switch on 12 command codes
0x00402000 inject_process() - Process hollowing into svchost.exe
0x00402400 persist_registry() - HKCU Run key persistence
0x00402800 exfil_data() - File collection and encrypted upload
C2 PROTOCOL
Method: HTTPS POST to /gate.php
Encryption: RC4 with derived key (MD5 of bot_id + campaign_key)
Bot ID Format: MD5(hostname + username + volume_serial)
Beacon Interval: 60 seconds with 10% jitter
Command Set:
0x01 - Download and execute file
0x02 - Execute shell command
0x03 - Upload file to C2
0x04 - Update configuration
0x05 - Uninstall and remove traces
ENCRYPTION DETAILS
Algorithm: RC4
Key Derivation: MD5(bot_id + "campaign_2025_q3")
Hardcoded Seed: "campaign_2025_q3" at offset 0x00405A00
EXTRACTED IOCs
C2 URLs: hxxps://update.malicious[.]com/gate.php
hxxps://backup.evil[.]net/gate.php (failover)
Campaign ID: campaign_2025_q3
RC4 Key Material: [see encryption details above]
Verification Criteria
Confirm successful execution by validating:
- [ ] All prerequisite tools and access requirements are satisfied
- [ ] Each workflow step completed without errors
- [ ] Output matches expected format and contains expected data
- [ ] No security warnings or misconfigurations detected
- [ ] Results are documented and evidence is preserved for audit
Compliance Framework Mapping
This skill supports compliance evidence collection across multiple frameworks:
- SOC 2: CC7.2 (Anomaly Detection), CC7.4 (Incident Response)
- ISO 27001: A.12.2 (Malware Protection), A.16.1 (Security Incident Management)
- NIST 800-53: SI-3 (Malicious Code Protection), IR-4 (Incident Handling)
- NIST CSF: DE.CM (Continuous Monitoring), RS.AN (Analysis)
Claw GRC Tip: When this skill is executed by a registered agent, compliance evidence is automatically captured and mapped to the relevant controls in your active frameworks.
Deploying This Skill with Claw GRC
Agent Execution
Register this skill with your Claw GRC agent for automated execution:
# Install via CLI
npx claw-grc skills add reverse-engineering-malware-with-ghidra
# Or load dynamically via MCP
grc.load_skill("reverse-engineering-malware-with-ghidra")
Audit Trail Integration
When executed through Claw GRC, every step of this skill generates tamper-evident audit records:
- SHA-256 chain hashing ensures no step can be modified after execution
- Evidence artifacts (configs, scan results, logs) are automatically attached to relevant controls
- Trust score impact โ successful execution increases your agent's trust score
Continuous Compliance
Schedule this skill for recurring execution to maintain continuous compliance posture. Claw GRC monitors for drift and alerts when re-execution is needed.