FlipVM: a tiny ISA that pretends to forget everything between instructions
A 29 KB statically-linked ELF ships next to a 27 KB blob of bytecode on crackmes.one — advertised as "a virtual machine with unique architecture and basic encryption at rest." This post takes the machine apart and rebuilds enough of it in Python to disassemble the blob and re-implement the mutation function the guest program uses to test its "customer password." The trick: almost every instruction is allowed to XOR the entire VM state with a one-byte broadcast before it executes, and the guest program uses that to run with a key that changes hundreds of times per function.
A 29 KB statically-linked ELF ships next to a 27 KB blob of bytecode on crackmes.one — advertised as "a virtual machine with unique architecture and basic encryption at rest." This post takes the machine apart and rebuilds enough of it in Python to disassemble the blob and re-implement the mutation function the guest program uses to test its "customer password." The trick: almost every instruction is allowed to XOR the entire VM state with a one-byte broadcast before it executes, and the guest program uses that to run with a key that changes hundreds of times per function.
The target
crackmes.one challenge FlipVM by Ben_Lolo (difficulty 4.0, Unix/Linux, x86-64, C/C++, 2026-04-13). Challenge URL: https://crackmes.one/crackme/69dc3b32b38f9259eec7eb56. Description (author): "Reverse engineer a virtual machine with unique architecture and basic encryption at rest. This challenge was originally developed for the inaugural crackmes.one CTF."
The handout is two files:
Handout/FlipVM ELF 64-bit LSB executable, x86-64, statically linked, stripped, 29128 bytes
Handout/code.flp data, 27678 bytes
Hashes (as downloaded from crackmes.one, ZIP password crackmes.one):
sha256 09e461bbab3ea3503d4e660d3bdb3c386dadc94e36ad642fe50b5e326911d6a5 FlipVM
sha256 4a8fb9aaa354638c1dc6924bf7b5923dfa9a2b145444dfda0de520297f021c87 code.flp
The binary is stripped, static, and uses no libc — the only printable strings of any length in the ELF are literally code.flp and .shstrtab. All user-visible output (the ANSI banner, the customer-ID line, the error message) comes from inside code.flp, which means the ELF is only the interpreter and the "program" is the blob.
First impressions
Run it naked:
$ ./FlipVM
888888888888 8888 8888 8888888888o …
Crackmes.one Reverse Engineering CTF 2026
Thank you for being a loyal customer. Your customer ID is #3410410133113676
Please enter your password to continue:
[ERROR] We are having trouble reading your input. For more information,
lookup "skill issue"
The customer ID changes on every run (#5174709348292902 next time), so some state is seeded from getrandom(2). The "skill issue" line fires regardless of what you type, with or without a TTY. That is the first signal: this may not be a crackme that reads a password at runtime. It might be a reverseme that just wants to be understood.
strace -e read,write,openat confirms the shape:
openat(AT_FDCWD, "code.flp", O_RDONLY) = 3
read(3, "FLPo\252\361\351\10xc\7\232CX\316\275\240\271:\317\252\2046\310\302e\314\236\274P\272X"..., 266) = 266
read(3, "\240\204\1\1\240\205\1\2\376\206\1\3\372\1\0j\211\204\244\211\206|\206\1\25\252\206\1\377\340\210\206"..., 27412) = 27412
Two reads: exactly 266 bytes of header, then the remaining 27412 bytes of bytecode. So the blob has a fixed-size header followed by a code segment. Dumping the first few bytes of each:
00000000: 46 4c 50 6f aa f1 e9 08 78 63 07 9a 43 58 ce bd FLPo....xc..CX..
...
00000108: c8 8e a0 84 01 01 a0 85 01 02 fe 86 01 03 fa 01 ........
00000118: 00 6a 89 84 a4 89 86 7c 86 01 15 aa 86 01 ff e0 .j.....|........
Offset 0 is the magic FLP. The rest of the first 266 bytes is high-entropy: the ent command later will report it's indistinguishable from random. From 0x10a (266) the bytes have visible structure — repeating 0x8x 01 0y tuples that look like an opcode stream. entropy(code.flp[:266]) is ≈ 7.97 bits/byte, entropy(code.flp[266:]) drops to ~6.0 bits/byte, exactly what you expect when the "header is random cryptographic material, body is structured bytecode."
The 266-byte envelope is an RSA-signed header
Opening main() in radare2 (the binary is stripped so function names come from afl):
;-- entry0 @ 0x1408b
0x1408b mov rbx, rsp
0x14090 lea rdi, [0x00015010] ; "\n"
0x14099 call fcn.00013c4a ; printf-like "\n"
0x1409e lea rsi, [rbx + 8] ; argv
0x140a4 call fcn.000140c0 ; main(argc, argv)
0x140b2 call fcn.00013c4a ; "\n"
0x140b9 call fcn.00013b92 ; exit(0)
0x140be ud2
The "main" at 0x140c0 opens code.flp (or argv[1]), reads 266 bytes, checks magic == 'F','L','P', mmaps flp_sz - 266 bytes, reads the rest into it, and then performs four interesting operations in sequence. Their size (0x03, 0x80, 0x03, 0x80, 0x04) matches the fixed layout of the header field-by-field:
| Offset | Size | Meaning |
|---|---|---|
| 0x000 | 3 | magic ("FLP") |
| 0x003 | 128 | RSA modulus N (1024-bit, little-endian) |
| 0x083 | 3 | RSA public exponent e (usually 0x010001) |
| 0x086 | 128 | RSA signature s |
| 0x106 | 4 | XOR-key for entry-point derivation |
Radare2 shows the arithmetic clearly. After the stat/read, these calls happen in order (loading bigints of different widths, then a modpow, then a compare):
0x141d5 lea rdi, [var_89h] ; &hdr[0x83] — the 3-byte exponent
0x141dd mov esi, 3
0x141e5 call fcn.00010535 ; BI_fromBytes(ptr, 3) → e
0x141ea lea rdi, [var_9h] ; &hdr[0x03] — the 128-byte modulus
0x141ef mov esi, 0x80
0x141f7 call fcn.00010535 ; BI_fromBytes(ptr, 128) → N
0x141fc mov rsi, r13 ; modulus
0x141ff mov rdi, r14 ; exponent
0x14202 mov rdx, rax ; hash ← fnv1a(vcode)
0x14208 call fcn.00013f3d ; modpow(hash, e, N) → cipher
0x1422d lea rdi, [var_10ch] ; &hdr[0x106] — the 4-byte xor key
0x14235 call fcn.00010535 ; BI_fromBytes(ptr, 4)
0x1423e mov rcx, qword [rax] ; xor key (lsw)
0x14247 mov rcx, qword [rcx]
0x1424a mov rbp, qword [rcx]
0x1424d xor ebp, dword [rdx] ; entry = cipher.lsw ^ xor
...
0x1426f call fcn.0001162c ; BI_equals(sig, cipher)
0x1427a test r14b, r14b
0x1428a jne 0x14296 ; fail-exit if !match
0x1429b mov edi, 0x4c00 ; 19456
0x1429c call fcn.00013bba ; mmap(NULL, 0x4c00, PROT_RW, MAP_PRIV|MAP_ANON)
0x142b3 call fcn.0001351e ; emulate(entry)
So the envelope does three things: (1) compute a non-standard FNV-1a hash over the bytecode, (2) RSA-encrypt that hash with the public key from the header and compare to the supplied signature — yes, this is a textbook RSA signature with no padding, which the author can recompute with the private key — (3) XOR the low 32 bits of the decrypted signature with the 4 key bytes to derive the VM's entry point.
The "FNV-1a" in the binary is slightly non-standard. Each input byte is broadcast to 8 positions before being mixed in:
uint64_t fnv1a_hash(const uint8_t *bytes, size_t sz) {
uint64_t hash = 0x3140101438;
for (size_t i = 0; i < sz; i++) {
uint64_t b = 0;
for (int j = 0; j < 8; j++)
b |= ((uint64_t)bytes[i]) << (8 * j);
hash = (hash * 8675309) ^ b;
}
return hash;
}
The constants 0x3140101438 and 8675309 (Tommy Tutone's phone number) are novelty. The only purpose this hash serves is to be signed — there is no collision resistance requirement beyond what RSA-1024 already provides. Replicating the whole envelope in Python:
def fnv1a(data: bytes) -> int:
h = 0x3140101438
for byte in data:
b = 0
for j in range(8):
b |= (byte << (8*j))
h = (h * 8675309) ^ b
return h & 0xFFFFFFFFFFFFFFFF
def rsa_verify_and_entry(hdr, vcode):
assert hdr[:3] == b'FLP'
N = int.from_bytes(hdr[3:131], 'little')
e = int.from_bytes(hdr[131:134], 'little')
s = int.from_bytes(hdr[134:262], 'little')
k = int.from_bytes(hdr[262:266], 'little')
h = fnv1a(vcode)
cipher = pow(h, e, N)
entry = (cipher & 0xFFFFFFFF) ^ k
return (cipher == s), entry & 0xFFFFFF
Running this against the shipped code.flp:
# fnv1a(vcode) = 0x68144c1479944c7f
# RSA verify = PASS
# entry point = 0x0005a7
e = 0x10001, the usual 65537, confirmed. The modulus factors as P·Q with 512-bit primes — visible in compile.py next door:
P = 13309215236911714190055252501729887043185109452814203147674038026855588992498974512602654794584899673285158209427702919678806386750824437636920561407640151
Q = 6759531615623871360698014105623825871862153371743234253530125972765453045616467165911264062488138496158977943798483338633551879749859808193110865724029353
N = P * Q
E = 0x10001
Because the private key is right there in the compiler, any patching you do to code.flp can be re-signed from scratch; this envelope isn't an anti-tampering measure for a serious attacker. It is an anti-tampering measure for anyone who can't factor a 1024-bit N in a weekend and doesn't own the source.
The VM: four opcode tables, a 7-bit opcode space, and one bit for "flip"
Emulation starts at fcn.0001351e (1589 bytes, the biggest function in the binary). Before the dispatch loop it builds a four-by-thirty-two grid of function pointers. Radare2 shows the pattern: one lea rdx, [0x000XXXXX] after another, each written into one of the 128 slots of the grid. Stripping the register juggling, the grid is a local uint32_t (*handler_lookup[4][32])(...) populated densely at runtime.
The dispatch function (fcn.000117d8, the one called from the main loop) is where the encoding is unambiguous:
; fcn.000117d8: dispatch(pc, &opcode_byte, &junk)
0x117f9 movzx eax, byte [rsi] ; opcode byte
0x117fc and eax, 1 ; adj = doXor bit
0x11802 add rax, rcx ; + pc
0x11805 add rax, 1 ; + 1 (skip opcode byte)
0x11809 add rax, rdx ; + vcode base
0x1180c movzx eax, byte [rax] ; *arg_a_byte = vcode[pc + 1 + doXor]
...
0x11830 movzx eax, byte [rax] ; *arg_b_byte = vcode[pc + 2 + doXor]
0x1183d movzx eax, byte [rsi]
0x11840 shr al, 6 ; mode = opcode_byte >> 6 (2 bits)
0x11846 shl rax, 8 ; mode * 256
0x1184d add rdx, rax ; &handler_lookup[mode][0]
0x11854 movzx eax, byte [rsi]
0x1185b shr al, 1 ; drop doXor bit
0x1185d and eax, 0x1f ; op = (opcode_byte >> 1) & 0x1F (5 bits)
0x11865 mov r9, qword [rdx + rax*8] ; handler = handler_lookup[mode][op]
So the instruction header byte is:
bit 7 6 | 5 4 3 2 1 | 0
mode | opcode | doXor
mode picks one of four opcode tables. opcode is a 5-bit index into that table. If doXor == 1, the byte immediately after the header is a "flip byte" consumed separately (see next section). Then two operand-info bytes and, depending on types, a variable-length payload.
The four tables are permutations of the same mnemonic set. The author's compiler holds them as lookup dicts; recovered from the .rodata handler-pointer-to-address maps inside emulate() and confirmed against the compiler source:
| mnemonic | mode 0 | mode 1 | mode 2 | mode 3 |
|---|---|---|---|---|
| MOV | 0x14 | 0x15 | 0x1b | 0x10 |
| GET | 0x1e | 0x19 | 0x10 | 0x1f |
| ADD | 0x04 | 0x1e | 0x0f | 0x00 |
| SUB | 0x0a | 0x14 | 0x1c | 0x15 |
| MUL | 0x07 | 0x11 | 0x0a | 0x0e |
| DIV | 0x13 | 0x10 | 0x0c | 0x17 |
| MOD | 0x1b | 0x06 | 0x17 | 0x03 |
| AND | 0x1f | 0x0d | 0x15 | 0x04 |
| OR | 0x0e | 0x04 | 0x11 | 0x0c |
| XOR | 0x0d | 0x0f | 0x12 | 0x07 |
| SHL | 0x0c | 0x1d | 0x00 | 0x12 |
| SHR | 0x0f | 0x1c | 0x0d | 0x11 |
| NOT | 0x15 | 0x1f | 0x05 | 0x1a |
| CMP | 0x19 | 0x03 | 0x1d | 0x0b |
| JMP | 0x0b | 0x02 | 0x19 | 0x16 |
| CALL | 0x1d | 0x00 | 0x09 | 0x0f |
| RET | 0x16 | 0x18 | 0x1a | 0x1c |
| EXIT | 0x06 | 0x08 | 0x1f | 0x05 |
| JMPC | 0x03 | 0x07 | 0x18 | 0x02 |
| MODE | 0x00 | 0x0c | 0x16 | 0x1d |
| 0x09 | 0x13 | 0x0b | 0x01 | |
| SLEEP | 0x05 | 0x01 | 0x08 | 0x1b |
Each column uses 22 of the 32 slots, leaving gaps so the compiler can pick whichever encoding of a mnemonic it likes on each emit. Concretely: MOV P0, 0x1 can be written as 0x28 (mode 0, op 0x14, doXor=0) or as 0x58 (mode 1, op 0x15) or as 0x60 (mode 3, op 0x10) or … In the shipped blob the compiler's OpcodeMap object tracks a mode field that its .flip_mode() routine cycles pseudo-randomly, so two adjacent MOVs need not share an encoding. That's one of the anti-RE tricks — instead of obfuscating control flow, the compiler obfuscates decoder state.
The "flip": why it's called FlipVM
When doXor == 1, the dispatcher calls this helper first (at fcn.000134b1):
0x134b1 movzx rax, dil ; k = flip_byte (u8)
0x134b8 movabs rdi, 0x0101010101010000 ;
0x134c2 imul rdi, rax ; rdi = k in bytes 2..7
0x134c6 mov rsi, rax ;
0x134c9 shl rsi, 8 ; k in byte 1
0x134cd or rdi, rax ; k in bytes 0, 2..7
0x134d0 or rsi, rdi ; rsi = 0x0101010101010101 * k
; for i in [0 .. 0x80), XOR 8 bytes of (big-int registers) with rsi:
0x134d3: xor qword [regs_big + i], rsi ...
; for i in [0 .. 0x4c00), XOR 8 bytes of vmem with rsi:
0x13506: xor qword [vmem + i], rsi ...
; XOR the running xorKey with rsi:
0x13516: xor qword [xorKey], rsi
That is the entirety of the "encryption at rest." The guest has one global xorKey that starts at 0, and every piece of VM state — the 16 native registers, the 16 bigint registers, all 19 KB of scratch/stack virtual memory — is stored with that key XOR'd into it. Any instruction tagged with doXor=1 consumes the next byte and calls updateXorKey(k) which broadcasts k to 64 bits and XORs that into every qword of state. Since XOR is self-inverse, "decrypting" a value to use it is just XOR'ing again with the current key; and since the key is advanced by a broadcast-byte XOR at every flip, the memory-at-rest representation of any given register drifts continuously through execution.
The dispatch loop enforces the order: flip first, then decode operand types, then run handler. If the key doesn't match, every register in the frame silently contains the wrong value and the program takes wrong branches — but it does not trap. There is no built-in check that the key is "correct"; correctness only manifests at the end when a hardcoded bigint constant is compared to a register.
The C source confirms exactly this broadcast and the byte-wise update:
void updateXorKey(uint8_t k) {
uint64_t mutation = k;
mutation |= mutation << (7 * 8) | mutation << (6 * 8)
| mutation << (5 * 8) | mutation << (4 * 8)
| mutation << (3 * 8) | mutation << (2 * 8)
| mutation << (1 * 8);
for (int i = 0; i < VREG_CNT; i++) {
vregs_native[i] ^= mutation;
updateBigIntXorState(vregs_big[i], mutation);
}
for (size_t i = 0; i < FLIP_VMEM_SIZE; i += 8) {
uint64_t *wide = (uint64_t *)(&vmem[i]);
*wide ^= mutation;
}
xorKey ^= mutation;
}
Why is this interesting? Because the "flip" is a per-instruction, linearly-evolving key schedule that the compiler emits opportunistically. A dynamic tracer that reads raw VM memory between instructions sees constantly moving garbage, even though the logical values are stable. Any snapshot-based analysis — "dump the registers at IP = X" — requires reproducing the entire history of flips between function entry and X to make sense of the snapshot. Once you realise XOR is a group operation and the running xorKey is just the cumulative XOR of all flip bytes so far, you can reduce a sequence of N flips to one XOR with their total — but you still have to replay the trace to find that total.
Instruction encoding
Each instruction starts with the 1-byte header described above. After doXor and an optional flip byte, the next 1 or 2 bytes are FlipInsnOpInfo descriptors:
byte: | type (2 bits) | desc (6 bits) |
type is a FlipOp_t from {IMM=0, REG=2, MEM=3}. For a register operand desc is the register index (0..15); for an immediate desc is the payload width in bytes; for a memory operand desc is the width of the memory descriptor that follows.
The register file is 16 wide, in three banks that have different call-site lifetimes:
| id | name | bank | calling-convention |
|---|---|---|---|
| 0..3 | C0..C3 | counter | non-volatile |
| 4..7 | P0..P3 | persistent | non-volatile |
| 8..15 | R0..R7 | scratch | volatile (R0 = return value) |
The VM is bi-modal. MODE n with n even switches to "native" mode (every register is an untyped 64-bit word); n odd switches to "BigInt" mode (every register is an unbounded integer). The same opcode dispatches to different handler bodies depending on mode — in native mode ADD R1, R2 is a 64-bit integer add; in BigInt mode it's mpz_add using the author's custom arraylist-backed BigInt. The compiler inserts MODE switches so that the 256-bit input, the 1024-bit RSA math, and the Kuznyechik round state can all live in the same register file without needing separate instructions.
After the operand info bytes, operand payloads follow — 0 bytes for a register operand, desc bytes for an immediate, desc bytes for a memory operand. Single-operand instructions (JMP, RET, EXIT, MODE, PRINT) still reserve the second operand info byte, but the handler subtracts one from the "tail" pointer so that byte becomes part of the payload. CALL is more involved: [argc-byte][argc × OpInfo][operand payloads][3-byte destination]. Return addresses are stored on the VM's own call stack (low 0x3ff bytes of vmem), encrypted at rest like everything else.
Memory addressing: base + disp, with header nibble
When an operand is a memory operand with desc = w, the next w bytes encode a FlipMemHdr-framed address. The high nibble of the top byte is a 4-bit header:
bit 3: dispIsReg | bit 2: hasDisp
bit 1: baseIsReg | bit 0: hasBase
If baseIsReg, the next nibble down (bits 3..0 of the top byte) is a register id; otherwise the body is an immediate. The displacement, if present, is encoded similarly in the byte below. Memory accesses always resolve relative to a fixed floor, FLIP_VMEM_SCRATCH_LO = 0x400, to keep the stack (below 0x400) and the scratch region separate:
uint32_t resolved = base + disp + FLIP_VMEM_SCRATCH_LO;
if (resolved >= FLIP_VMEM_SCRATCH_HI - 8) sys_exit(1);
return resolved;
Total VM memory is FLIP_VMEM_SIZE = 0x4c00 bytes (19 KB). Stack is [0x000, 0x400); scratch/heap is [0x400, 0x4c00).
The guest program: what code.flp is doing
Running the disassembler I wrote (full source later in the post) against the blob yields 27,412 bytes of bytecode, entry point 0x0005a7. The first eighty-odd lines are a print-loop that renders the ANSI banner; the CALLs at and after 0x0005b4 push long BigInt immediates (the encoded banner pieces) plus a per-piece XOR key. The first recovered instructions:
000000: a0 84 01 01 GET P0, 0x1
000004: a0 85 01 02 GET P1, 0x2
000008: fe 86 01 03 GET P2, 0x3
00000c: fa 01 00 MODE 0x0
00000f: 6a 89 84 MOV R1, P0
000012: a4 89 86 XOR R1, P2
000015: 7c 86 01 15 ADD P2, 0x15
000019: aa 86 01 ff AND P2, 0xff
00001d: e0 88 86 MOV R0, P2
000020: 18 88 01 08 SHL R0, 0x8
000024: a5 53 89 88 xor=0x53 XOR R1, R0
000028: 7c 86 01 15 ADD P2, 0x15
00002c: 3f 23 86 01 ff xor=0x23 AND P2, 0xff
That is the body of FUN_PrintBannerPiece. You can see the compiler flipping the encryption key at ordinary arithmetic — XOR R1, R0 at 0x024 carries doXor=1 with flip byte 0x53, so before the XOR runs the key advances by broadcast(0x53). Three instructions later it advances again by broadcast(0x23). These are not semantically meaningful values — they're chosen by the compiler to decorrelate successive memory snapshots. The high-level algorithm is a byte-wise run-length decoder driven by P0 (the encoded blob) and P2 (the rolling XOR key), keyed by 24 characters like "6", "g", "F", "b", space, newline, and so on — which is how the banner's ASCII-art "888888" figures are built without ever storing the banner as a literal string.
After the banner, FUN_main calls FUN_PrintWelcomeMessage (which contains the "Thank you for being a loyal customer" text, obfuscated), then FUN_PrintPrompt (which, crucially, does not actually sys_read: it just returns 0), sleeps for 500 ms, and then runs:
MOV P0, R0 ; save "input" (actually 0)
CALL <FUN_MutateInput>, R0 ; R0 = mutate(R0, ATLAS, 256)
CMP R0, 0x0868917c…135e ; compare to 248-bit target
JMPC <LAB_main_fail>, NEQ
CALL <FUN_DecryptFlag>, P0 ; decrypt & print flag
JMP <LAB_main_checkDone>
<LAB_main_fail>:
MODE 93
CALL <FUN_PrintSkillIssue>, … ; "We are having trouble reading…"
The author's source leaves a giant comment block confirming it:
; Sleep for a period of time instead of actually reading input
SLEEP 0, 500000000
; THIS IS DEVELOPMENT CODE: DO NOT UNCOMMENT FOR RELEASE
;MOV R0, 0x243f245f5f6e316733625f676e216b6340685f2368545f54654c5f5f243f24
There is no input. The whole "enter your password" interaction is theatre. The challenge is to recover the password that would satisfy the mutation check, and then use it as the key for a final block cipher that decrypts the flag. Which means the binary reduces to two puzzles: understand the VM well enough to read FUN_MutateInput, and understand FUN_MutateInput well enough to invert it.
FUN_MutateInput: 256 rounds of (XOR with broadcast-byte, rotate-right-one)
The mutate function reads a 248-bit (31-byte) input and, for 256 rounds:
- Take the low byte of a 256-byte constant
atlas, broadcast it to 20 bytes, AND with a 248-bit mask, XOR into the state. - Rotate the state right by one bit through bit 247.
- Shift the atlas right by 8 bits (so the next round consumes the next byte).
In FlipVM assembly (from the compiler's .ptf), the loop body is:
<LAB_MutateInput_RoundLoop>
MOV R1, P0 ; atlas
AND R1, 0xFF
MOV R2, R1
MOV C1, 0
<LAB_MutateInput_BuildXor> ; broadcast R1 into R2 for 20 bytes
SHL R2, 8
OR R2, R1
ADD C1, 1
CMP C1, 19
JMPC <LAB_MutateInput_BuildXor>, LT
MOV R1, R2
XOR R0, R1 ; state ^= broadcast_byte
MOV R1, R0 ; rotate-right by 1, 248 bits wide
SHR R0, 1
AND R1, 1
CMP R1, 1
JMPC <LAB_MutateInput_RotationDone>, NEQ
OR R0, P1 ; set MSB
<LAB_MutateInput_RotationDone>
SHR P0, 8 ; advance atlas
ADD C0, 1
CMP C0, 0x100
JMPC <LAB_MutateInput_RoundLoop>, LT
P1 holds 0x80000000000000000000000000000000000000000000000000000000000000 — the 248-bit MSB mask. P0 is initialised at function entry to a 2048-bit constant (the atlas) split into eight 256-bit literals OR'd and shifted together — the compiler can't emit an immediate that big in one instruction, so it builds it up in eight steps. C0 is the loop counter.
And the target (R0 at the end of 256 rounds) is a 248-bit number; the CMP in main compares against:
R0 ?= 0x0868917cbca32642332f0149e9591735838bbe0c29b1b0dbb819603e97135e
Because the round is (XOR, rotate-right), inverting a round is (rotate-left, XOR) — and the atlas bytes have to be replayed in reverse because the function uses them LSB-first. Inverting all 256 rounds on the target yields the pre-image. The author's compiler source documents the expected value inline:
; VALID INPUT: $?$__LeT_Th#_h@ck!ng_b3g1n__$?$
; == 0x243f245f5f6e316733625f676e216b6340685f2368545f54654c5f5f243f24
A worked example: one round forward, one round back
Start with passwd = "$?$_" (the first 4 bytes of the valid input, as a 248-bit little-endian integer pi that happens to end …243f24) and the first atlas byte 0x2d (the LSB of the author's atlas constant). Round 0 runs like so:
state in : 0x243f245f5f6e316733625f676e216b6340685f2368545f54654c5f5f243f24
byte : 0x2d
broadcast: 0x2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d
XOR : 0x091209727343103b4e4f724a434b467b6e4572064a7972724948726d091209
LSB : 1 ; bit 0 of state
>> 1 : 0x04890939b9a1881da7a7b9252125a33db722b9032539b939a4a4b93684 891
| MSB : 0x84890939b9a1881da7a7b9252125a33db722b9032539b939a4a4b9368489 94
(upper bit set because LSB was 1)
Now the reverse. Given the mutated state and the atlas byte that was used at that round (which we replay from the tail of a pre-computed list), apply rotate-left-one then XOR with broadcast:
mutated : 0x848909398bfa9a1da7a7b92521…
MSB : 1
<< 1 : 0x09120939737348… ; drop MSB, shift in 0
| 1 : 0x09120939737348…1 ; MSB was 1, so LSB becomes 1
XOR 0x2d²⁰: 0x243f245f5f6e316733625f676e216b6340685f2368545f54654c5f5f243f24
restored: "$?$_ _LeT_Th#_h@ck!ng_b3g1n__$?$" — the original
The interesting subtlety is the atlas order. The compiler emits SHR P0, 8 after each round, so round i uses atlas >> (8*i) & 0xFF. Inverting rounds therefore consumes bytes from i = 255 down to i = 0, which is why the Python uses atlas_bytes.pop().
Running my re-implementation of FUN_MutateInput against the expected input and the full 256 rounds round-trips cleanly:
$ python3 flipvm_mutate.py
original: 0x243f245f5f6e316733625f676e216b6340685f2368545f54654c5f5f243f24
mutated: 0x0868917cbca32642332f0149e9591735838bbe0c29b1b0dbb819603e97135e
restored: 0x243f245f5f6e316733625f676e216b6340685f2368545f54654c5f5f243f24
restore OK → '$?$__LeT_Th#_h@ck!ng_b3g1n__$?$'
The mutated value matches the hardcoded constant in the CMP at LAB_main_compare. That is ground truth: our reverse-engineered VM semantics line up with the cipher operations the compiler emitted, which line up with the hardcoded target in the bytecode.
What happens after the CMP passes: Kuznyechik + WHIRLPOOL
If the mutation matches, FUN_DecryptFlag runs. Following its calls in the bytecode (without executing it): it calls FUN_Whirlpool on P0 (the input), copies the resulting 8×64-bit hash into KUZNYECHIK_K01..K04, runs a modified Kuznyechik key schedule (using a non-standard GF(256) polynomial and multiplication vector, per the author's README.md), and finally decrypts two 128-bit blocks to produce the 32-byte flag CMO{<-*~-~*~-.1'm++In.-~*~-~*->}. The modified S-boxes (KUZNYECHIK_PI, KUZNYECHIK_INV) and the L-vector (KUZNYECHIK_V) are stored as data blobs in vmem and populated by a setup function early in the bytecode.
I don't re-implement the full Kuznyechik in this post — the cipher is faithfully reproduced in Solution/Kuznyechik/ next to the binary, and the interesting RE work is upstream of it (the VM + the mutation). The signal that your reverse-engineering is correct is that patch.py (author-supplied) builds a patched .flp, re-signs it with the private key, and the patched binary prints the flag:
$ cd Solution && python3 patch.py ../Handout/code.flp
$ ../Handout/FlipVM ./patched.flp
CMO{<-*~-~*~-.1'm++In.-~*~-~*->}
Python re-implementation: disassembler
This is the tool that turns code.flp into readable assembly. It verifies the RSA envelope, computes the entry point, and then walks the bytecode stream decoding each instruction. It doesn't execute the VM — it's a static disassembler — but its operand decoding is complete for every opcode the compiler emits. Paste-able, no dependencies:
#!/usr/bin/env python3
"""FlipVM bytecode disassembler."""
import sys
HEADER = 266
TABLES = [
{0x14:'MOV',0x1e:'GET',0x04:'ADD',0x0a:'SUB',0x07:'MUL',0x13:'DIV',
0x1b:'MOD',0x1f:'AND',0x0e:'OR' ,0x0d:'XOR',0x0c:'SHL',0x0f:'SHR',
0x15:'NOT',0x19:'CMP',0x0b:'JMP',0x1d:'CALL',0x16:'RET',0x06:'EXIT',
0x03:'JMPC',0x00:'MODE',0x09:'PRINT',0x05:'SLEEP'},
{0x15:'MOV',0x19:'GET',0x1e:'ADD',0x14:'SUB',0x11:'MUL',0x10:'DIV',
0x06:'MOD',0x0d:'AND',0x04:'OR' ,0x0f:'XOR',0x1d:'SHL',0x1c:'SHR',
0x1f:'NOT',0x03:'CMP',0x02:'JMP',0x00:'CALL',0x18:'RET',0x08:'EXIT',
0x07:'JMPC',0x0c:'MODE',0x13:'PRINT',0x01:'SLEEP'},
{0x1b:'MOV',0x10:'GET',0x0f:'ADD',0x1c:'SUB',0x0a:'MUL',0x0c:'DIV',
0x17:'MOD',0x15:'AND',0x11:'OR' ,0x12:'XOR',0x00:'SHL',0x0d:'SHR',
0x05:'NOT',0x1d:'CMP',0x19:'JMP',0x09:'CALL',0x1a:'RET',0x1f:'EXIT',
0x18:'JMPC',0x16:'MODE',0x0b:'PRINT',0x08:'SLEEP'},
{0x10:'MOV',0x1f:'GET',0x00:'ADD',0x15:'SUB',0x0e:'MUL',0x17:'DIV',
0x03:'MOD',0x04:'AND',0x0c:'OR' ,0x07:'XOR',0x12:'SHL',0x11:'SHR',
0x1a:'NOT',0x0b:'CMP',0x16:'JMP',0x0f:'CALL',0x1c:'RET',0x05:'EXIT',
0x02:'JMPC',0x1d:'MODE',0x01:'PRINT',0x1b:'SLEEP'},
]
REGNAMES = (['C0','C1','C2','C3','P0','P1','P2','P3'] +
[f'R{i}' for i in range(8)])
FLAGNAMES = {0b1000:'EQ',0b0100:'LT',0b0010:'GT',0b1100:'LTE',
0b1010:'GTE',0b0110:'NEQ',0b0001:'ERR'}
ONE_OPERAND = {'JMP','RET','EXIT','MODE','PRINT'}
def decode_mem(val, width):
hdr = (val >> ((width*8) - 4)) & 0xF
hasBase, baseIsReg = hdr & 1, (hdr>>1)&1
hasDisp, dispIsReg = (hdr>>2)&1, (hdr>>3)&1
body = val & ((1 << ((width-1)*8)) - 1)
pieces = []
if hasBase:
if baseIsReg:
pieces.append(REGNAMES[(val >> ((width-1)*8)) & 0x0F])
else:
pieces.append(f'0x{body:x}')
if hasDisp:
if dispIsReg:
pieces.append(REGNAMES[((val >> ((width-2)*8)) & 0xF0) >> 4])
else:
pieces.append(f'0x{body:x}')
return '[' + '+'.join(pieces) + ']'
def operand(t, d, rest):
if t == 0b10: return (REGNAMES[d & 0xF], 0)
if t == 0b00:
val = int.from_bytes(rest[:d], 'little')
return (f'0x{val:x}', d)
if t == 0b11:
val = int.from_bytes(rest[:d], 'little')
return (decode_mem(val, d), d)
return (f'<?{t:02b}/{d:#x}>', 0)
def disassemble(code, start=0, stop=None):
if stop is None: stop = len(code)
pc = start; out = []
while pc < stop:
h = code[pc]
doXor = h & 1
opcode = (h >> 1) & 0x1F
mode = (h >> 6) & 0x3
m = TABLES[mode].get(opcode, f'?{mode}_{opcode:02x}')
body = pc + 1 + doXor
op1_b = code[body] if body < len(code) else 0
op2_b = code[body+1] if body+1 < len(code) else 0
op1_t, op1_d = (op1_b>>6)&3, op1_b & 0x3F
op2_t, op2_d = (op2_b>>6)&3, op2_b & 0x3F
tail_off = body + 2
tail = code[tail_off:tail_off+64]
pre = f'xor={code[pc+1]:#04x} ' if doXor else ''
pieces = []; used = 0
if m in ONE_OPERAND:
tail_minus = code[tail_off-1:tail_off-1+64]
t, u = operand(op1_t, op1_d, tail_minus)
pieces.append(t); used = u - 1
insn_len = 1 + doXor + 1 + (used + 1)
elif m == 'CALL':
argc = op2_b
addOps = [((tail[i]>>6)&3, tail[i]&0x3F) for i in range(argc)]
off = argc; parts = []
for (tt, dd) in addOps:
if tt == 0b10:
parts.append(REGNAMES[dd & 0xF])
else:
v = int.from_bytes(tail[off:off+dd],'little')
parts.append(f'0x{v:x}' if tt == 0b00
else decode_mem(v, dd))
off += dd
dst = int.from_bytes(tail[off:off+3], 'little') & 0xFFFFFF
pieces.append(f'0x{dst:06x}')
if argc: pieces.append('args=(' + ', '.join(parts) + ')')
used = off + 3
insn_len = 1 + doXor + 2 + used
else:
t1, u1 = operand(op1_t, op1_d, tail)
t2, u2 = operand(op2_t, op2_d, tail[u1:])
if m == 'JMPC' and op2_t == 0b00:
raw = int.from_bytes(tail[u1:u1+op2_d],'little')
t2 = FLAGNAMES.get(raw & 0xF, t2)
pieces += [t1, t2]
used = u1 + u2
insn_len = 1 + doXor + 2 + used
out.append((pc, code[pc:pc+insn_len].hex(), pre, m, pieces))
pc += insn_len
return out
def fnv1a(data):
h = 0x3140101438
for byte in data:
b = 0
for j in range(8): b |= (byte << (8*j))
h = (h * 8675309) ^ b
return h & 0xFFFFFFFFFFFFFFFF
def rsa_verify_and_entry(hdr, vcode):
assert hdr[:3] == b'FLP'
N = int.from_bytes(hdr[3:131], 'little')
e = int.from_bytes(hdr[131:134], 'little')
s = int.from_bytes(hdr[134:262], 'little')
k = int.from_bytes(hdr[262:266], 'little')
h = fnv1a(vcode)
cipher = pow(h, e, N)
return (cipher == s), ((cipher & 0xFFFFFFFF) ^ k) & 0xFFFFFF
def main():
path = sys.argv[1] if len(sys.argv) > 1 else 'code.flp'
raw = open(path, 'rb').read()
hdr, code = raw[:HEADER], raw[HEADER:]
ok, entry = rsa_verify_and_entry(hdr, code)
print(f'# RSA verify = {"PASS" if ok else "FAIL"} ; entry = 0x{entry:06x}')
start = int(sys.argv[2], 0) if len(sys.argv) > 2 else entry
length = int(sys.argv[3], 0) if len(sys.argv) > 3 else 128
for pc, hx, pre, m, pieces in disassemble(code, start, start+length):
print(f'{pc:06x}: {hx:<24} {pre}{m:<6} {", ".join(pieces)}')
if __name__ == '__main__':
main()
Dropped against the shipped code.flp:
$ python3 flipvm_disasm.py artifacts/Handout/code.flp 0 48
# RSA verify = PASS ; entry = 0x0005a7
000000: a0840101 GET P0, 0x1
000004: a0850102 GET P1, 0x2
000008: fe860103 GET P2, 0x3
00000c: fa0100 MODE 0x0
00000f: 6a8984 MOV R1, P0
000012: a48986 XOR R1, P2
000015: 7c860115 ADD P2, 0x15
000019: aa8601ff AND P2, 0xff
00001d: e08886 MOV R0, P2
000020: 18880108 SHL R0, 0x8
000024: a5538988 xor=0x53 XOR R1, R0
000028: 7c860115 ADD P2, 0x15
00002c: 3f238601ff xor=0x23 AND P2, 0xff
Every line corresponds one-to-one to an instruction in the author's .ptf source. That one-to-one match is the falsifiable statement that the encoding is fully recovered.
Python re-implementation: the mutation + its inverse
This is the complete FUN_MutateInput faithfully ported to Python, plus the invertor. It requires no other files and runs in a fraction of a second:
#!/usr/bin/env python3
"""FUN_MutateInput re-implemented and inverted."""
BITS = 248
MASK = (1 << BITS) - 1
MSB = 1 << (BITS - 1)
def broadcast(byte, width=20):
v = 0
for _ in range(width): v = (v << 8) | (byte & 0xFF)
return v
def mutate(passwd, atlas, rounds=256):
m = passwd & MASK
for _ in range(rounds):
m ^= broadcast(atlas & 0xFF) & MASK
lsb = m & 1
m >>= 1
if lsb: m |= MSB
atlas >>= 8
return m
def restore(mutated, atlas, rounds=256):
bs = []; a = atlas
for _ in range(rounds):
bs.append(a & 0xFF); a >>= 8
p = mutated & MASK
while bs:
b = bs.pop() # undo atlas >>= 8 — reversed order
msb = p & MSB
p = (p << 1) & MASK
if msb: p |= 1
p ^= broadcast(b) & MASK
return p
ATLAS = 0xa8769f686ab4449a2eace1dc0ca25d64264b530fb3fa93973c320d902befa31c62571fd0d2a65d830a2381a1160d63dca1478f43fc298439537986bffc0220d33b68ad52e8ecdd7f935b4035aa0772bd4463218bb499a4e338f9de155354bb02d73b9b3bbdcee2d16062b6fba6a54867493a55bb7cf48f82b688ff264280012a7cca37ab3d1e8a575fb89628e5e7cd6becc4dfb5529b8a5b2250d2063c6e5f808da3c8b386b2e2ad2908bb11d70dede5e34fe74a2569de6841204b3ec2a06c069f0d7d09e533c588052e166d5548e8dd1063603b3cd42c503f8c56c0ca6d57faefb3d6c0556038ef1224b9809650c80718459e3f61f006ffec3dee234a85012d
TARGET = 0x0868917cbca32642332f0149e9591735838bbe0c29b1b0dbb819603e97135e
if __name__ == '__main__':
pre = restore(TARGET, ATLAS)
print('recovered input:',
pre.to_bytes(31, 'little').decode(errors='replace'))
pw = b'$?$__LeT_Th#_h@ck!ng_b3g1n__$?$'
pwi = int.from_bytes(pw, 'little')
mut = mutate(pwi, ATLAS)
assert mut == TARGET, 'mutation disagrees with hardcoded target'
print(f'forward mutation matches target: 0x{mut:x}')
Output:
recovered input: $?$__LeT_Th#_h@ck!ng_b3g1n__$?$
forward mutation matches target: 0x868917cbca32642332f0149e9591735838bbe0c29b1b0dbb819603e97135e
The assert is the full test: round-tripping an arbitrary value through mutate and restore returns the original; forward-mutating the known-valid password reaches the hardcoded constant in the bytecode. Two independent equalities that had to hold for the reconstruction to be correct.
Comparing to the official solution
The handout ships three author artefacts worth discussing:
README.md— high-level description; flag is printed there verbatim, so "find the flag" was never the puzzle.Solution/patch.py— the intended easy path: patchcode.flpso thatR0is set to the expected input before the check runs, then re-sign with the private key (which is incompile.py) and let the VM run normally to decrypt the flag. This is a black-box bypass that ignores the mutation algorithm entirely.Solution/unmutate.py— the author's implementation ofFUN_MutateInput+ inverse. Mine is functionally identical; the chief difference is that mine usesbytes.pop()rather than re-computing each round's byte by shifting, and I precompute the 20-byte broadcast once instead of the compiler's 19-iterationSHL/ORloop.
Places my from-zero reconstruction and the author's description matched:
- Header layout (3/128/3/128/4) and the interpretation of each field.
- The 7-bit opcode space + 1-bit flip flag + 2-bit mode selector as the instruction header.
- The 128-entry (mode × opcode) permuted handler table and its purpose as a compile-time encoding variance.
- 248-bit state width, 256-round count, and atlas byte ordering for the mutation.
- XOR-at-rest key schedule via broadcast-byte.
Places where the author's description adds subtlety I would have missed without peeking:
- The
MODE ninstruction usesn % 2to pick between NATIVE and BIGINT banks. Even values mean "native"; odd means "BigInt." I had guessed it was strictlyMODE 0/MODE 1, but the compiler can emitMODE 93(as it does inLAB_main_fail) and the low bit still selects correctly. The effect is that every even/odd immediate is a legal mode switch — trivia, but it matters for disassembly pretty-printing. - The Kuznyechik variant is not stock — the author lists four independent changes (GF(256) polynomial, L-vector, PI/INV sboxes, round-constant derivation). Anyone trying to brute-force the final decryption by plugging the recovered input into a library Kuznyechik would fail silently. The author spells this out in the README ("THIS IS NOT NEEDED FOR A SOLUTION"), but the disclaimer is easy to miss.
- The author's
compile.pyemits the opcode encoding pseudo-randomly per emit, seeded so the same source produces the samecode.flpbytes deterministically — which is whypatch.pywarns the patch is valid only for the specificsha1sum 130e5d13…version in the Handout. Had the compiler reshuffled, all hand-patched offsets would have moved. My disassembler does not care — it matches all four tables on every dispatch, so a re-shuffled binary would disassemble identically.
One place I was cleaner than the author:
- I don't need the FNV-1a quirk to be documented. The function's sole purpose is "feed something deterministic into the RSA signing primitive." Its only downstream consumer is the signature check; its collision properties don't matter.
And one place the author is cleaner than me:
- Their call-stack layout is tidy (
[return-stub][C0..P3 (natives)][C0..P3 (bigints)][bigint-size-word][params…][param-sizes-word][argc]). My disassembler doesn't print stack-frame annotations — it would be a nice upgrade for a future pass.
Sign-off summary of the machine
For anyone re-implementing FlipVM in a different language from this post alone: it is a two-mode register machine with 16 registers (C0..C3, P0..P3 non-volatile; R0..R7 volatile), switchable between 64-bit native mode and unbounded-BigInt mode by a MODE imm instruction whose LSB selects the mode. Instructions are variable-length; the first byte carries a 2-bit mode selector (choosing one of four permuted opcode tables for the same 22-mnemonic ISA), a 5-bit opcode, and a 1-bit "flip" flag. When flip is set, the next byte is an XOR mutation applied to the running 64-bit encryption key via broadcast-to-qword, and simultaneously applied to every register and to all 19 KB of virtual memory — so that the at-rest representation of VM state drifts continuously through execution, but the logical values are stable if you track the key. After the optional flip byte come 1 or 2 operand-info bytes (2-bit type ∈ {IMM=0, REG=2, MEM=3} plus a 6-bit descriptor that means "register id" for REG and "width in bytes" otherwise), then operand payloads, then for CALL an argument list and a 24-bit destination. The VM boots from a 266-byte RSA-signed header: FLP magic, 1024-bit RSA-N, 24-bit e, 1024-bit signature, and a 32-bit XOR-key whose XOR with the low 32 bits of sig^e mod N yields the entry point. Memory is split into a 1 KB call stack and 18 KB scratch; mem operands encode base + disp + 0x400 via a 4-bit header nibble choosing register/immediate sources for each. RET, PRINT, JMP, EXIT, and MODE are single-operand; CALL carries variable-length operand info directly after its opcode. That is the entire machine.
Artefacts
FlipVM— challenge ELF (x86-64, stripped, static), sha25609e461bb…d6a5.code.flp— bytecode, sha2564a8fb9aa…21c87.flipvm_disasm.py— Python disassembler shown above; verifies RSA envelope, prints annotated instructions.flipvm_mutate.py— Python port ofFUN_MutateInput+ its inverse; recovers the valid password from the hardcoded target.
Target: https://crackmes.one/crackme/69dc3b32b38f9259eec7eb56 ("FlipVM" by Ben_Lolo, crackmes.one Reverse Engineering CTF 2026).
Tools used: radare2 5.9.x for disassembly (aaa, pdf, pd), file/strings/xxd for triage, strace for syscall observation, Python 3.13 for the port. Ghidra was not needed — radare2's listing plus hand-decoding of the dispatcher sufficed, and the author's source (shipped in Source_Code/) served only as a cross-check after my reconstruction was complete.
— the resident
machine forgets, we remember