The 38 op-codes of choose-your-own-adventure: a ptrace-as-bytecode VM in 18 KB
Eighteen kilobytes of stripped Linux ELF. One prompt. No symbols, no anti-debug fluff, no packer. The binary forks itself, ptraces the child, sets the child's instruction pointer for every byte of the password, and treats the resulting SIGILL parade as the fetch-decode-execute loop of a tiny stack-machine. This is the story of reverse-engineering all 38 op-codes from the disassembly alone, plus a Python re-implementation that agrees with the binary on every input I threw at it.
Eighteen kilobytes of stripped Linux ELF. One prompt. No symbols, no anti-debug fluff, no packer. The binary forks itself, ptraces the child, sets the child's instruction pointer for every byte of the password, and treats the resulting SIGILL parade as the fetch-decode-execute loop of a tiny stack-machine. This is the story of reverse-engineering all 38 op-codes from the disassembly alone, plus a Python re-implementation that agrees with the binary on every input I threw at it.
The target
$ file /labs-output/work/obfuscated
ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked,
interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=dd428aa1...3504,
for GNU/Linux 3.2.0, stripped
$ sha256sum /labs-output/work/obfuscated
ee3a8ae44298957626fc5e0d5e19d96d5fc4f2e7a92fbf123c6ae166f6fdc371
$ wc -c /labs-output/work/obfuscated
18696
The challenge is wrenches's "choose-your-own-adventure" (≈18 KB) from crackmes.one/crackme/697f81c0e04ca145cd9d13c9, listed as Linux x86-64, difficulty 4.0/6. Author's description: "a little bit of this, a little bit of that…". Zero comments, zero writeups, which made it irresistible — I had no spoilers to dodge and no hints to fall back on. Built with GCC 15.2.0 / Debian (per the .comment string), default ASLR via PIE, no fortify/canary/anti-debug-by-process-name. The only interesting imports are ptrace, fork, waitpid, calloc, plus the usual printf / fgets.
The file is not packed — entropy of the .text section is well below 7.0, and readelf -S shows a vanilla layout. So whatever's hiding is hiding in the code itself.
$ xxd obfuscated | head -2
00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............
00000010: 0300 3e00 0100 0000 d010 0000 0000 0000 ..>.............
A standard ELF64 header (e_type = 0x0003 = ET_DYN, entry = 0x10d0).
First impressions
strings -n6 returns about twenty meaningful symbols and only three human-readable lines:
enter serial number >
correct serial number!
wrong serial number!
Pair that with the import set — ptrace, waitpid, fork, calloc, printf, fgets, exit — and we already have a hypothesis: the program forks itself, the parent ptraces the child, and the validation runs in the child while the parent oversees.
strace -f confirms it, though imperfectly (more on that in a moment):
clone(...) = 844
[pid 844] ptrace(PTRACE_TRACEME) = -1 EPERM
[pid 844] --- SIGILL {si_signo=SIGILL, ...} ---
[pid 844] +++ killed by SIGILL (core dumped) +++
ptrace(PTRACE_GETREGS, 844, ...) = -1 ESRCH
ptrace(PTRACE_SETREGS, 844, {rip=0x644a82628434, ...}) = -1 ESRCH
ptrace(PTRACE_CONT, 844, NULL, 0) = -1 ESRCH
write(1, "...wrong serial number!\n", 43)
strace's observation breaks the program — once strace is attached, PTRACE_TRACEME returns EPERM because a process can only have one tracer at a time, so the child can't trace itself, and the parent's GETREGS/SETREGS/CONT afterwards all fail with ESRCH because the child has already been killed by an uncaught SIGILL. But the intent is unambiguous: the parent's loop wants to getregs, set rip, continue for each character of the password. That is a debugger-as-interpreter pattern, with the child as a passive instruction stream and the parent as the dispatcher.
This is enough of a fingerprint to predict what I'll find in main: a fork, an array of code pointers, a per-character loop that picks one of them and slams it into the child's rip.
Static pass: the parent is a dispatcher
objdump -d --no-show-raw-insn obfuscated | sed -n '/^00000000000010d0 <.text>/,/^0000000000002aa0 <.fini>/p' is the only static tool I needed. Headed by the stripped entry point at 0x10d0, the only real function (other than PLT trampolines and _init/_fini) is main at 0x14ab, identified by the lea that the start-glue feeds to __libc_start_main:
10d0: xor %ebp,%ebp
10d2: mov %rdx,%r9
10d5: pop %rsi
10d6: mov %rsp,%rdx
10d9: and $0xfffffffffffffff0,%rsp
10dd: push %rax
10de: push %rsp
10df: xor %r8d,%r8d
10e2: xor %ecx,%ecx
10e4: lea 0x3c0(%rip),%rdi # 14ab ← main
10eb: call *0x3ecf(%rip) # __libc_start_main
Main's prologue and the user-facing prompt:
14ab: push %rbp
14ac: mov %rsp,%rbp
14af: sub $0x140,%rsp ; 0x140 bytes locals
14b6: lea 0x1b47(%rip),%rax # 3004 ; "enter serial number > "
14bd: mov %rax,%rdi
14c0: mov $0x0,%eax
14c5: call 1050 <printf@plt>
14ca: mov 0x3b8f(%rip),%rdx # 5060 ; stdin
14d1: lea -0x60(%rbp),%rax ; buf
14d5: mov $0x40,%esi ; 64 bytes
14da: mov %rax,%rdi
14dd: call 1060 <fgets@plt>
Then the choreography I half-expected: a 256-entry table is allocated with calloc(8, 0x100) (the ABI loads rdi=8 into the first arg and rsi=0x100 into the second; GCC swapped the conceptual order, but 8 × 256 = 2048 bytes either way), and main fills it in by 256 successive mov %rdx,(%rax) stores, each of which is preceded by lea -<imm>(%rip),%rdx # <addr>. The first three rows of the table-fill, raw from objdump:
14e2: mov $0x100,%esi
14e7: mov $0x8,%edi
14ec: call 1070 <calloc@plt>
14f1: mov %rax,-0x10(%rbp) ; tbl = calloc(256, 8)
14f5: lea -0x1b9(%rip),%rdx # 1343 ; tbl[0] = 0x1343
14fc: mov -0x10(%rbp),%rax
1500: mov %rdx,(%rax)
1503: mov -0x10(%rbp),%rax
1507: add $0x8,%rax
150b: lea -0x2f2(%rip),%rdx # 1220 ; tbl[1] = 0x1220
1512: mov %rdx,(%rax)
1515: mov -0x10(%rbp),%rax
1519: add $0x10,%rax
151d: lea -0x265(%rip),%rdx # 12bf ; tbl[2] = 0x12bf
1524: mov %rdx,(%rax)
Read 256 of those, all of them inline — the entire dispatch table is hard-coded. The pointers all fall in a narrow range [0x11b9 .. 0x14a4] inside .text. I parsed the whole sequence with a tiny objdump-scraper (parse_table.py, embedded later in this post) and discovered there are only 38 distinct target addresses for 256 byte values — i.e. the binary is a 256-way switch with 38 cases, indexed by the literal byte (effectively 128-way for non-negative bytes — the index is sign-extended via movsbq, so only entries 0x00..0x7F of the calloc'd table are ever reached for printable input; see the signed-index note below).
After the table fill comes the heart of the program. The fork, the child's ptrace-traceme + UD2, and the parent's per-character loop:
28cf: call 10b0 <fork@plt>
28d4: mov %eax,-0x14(%rbp)
28d7: cmpl $0x0,-0x14(%rbp)
28db: jne 291e ; parent branch
; ── child branch ──
28dd: mov $0x0,%ecx
...
28f1: mov $0x0,%eax
28f6: call 1080 <ptrace@plt> ; ptrace(PTRACE_TRACEME, 0, 0, 0, 0)
28fb: xor %eax,%eax ; clear scratch regs
28fd: xor %ebx,%ebx
28ff: xor %r8d,%r8d
...
2911: xor %r14d,%r14d
2914: xor %r15d,%r15d
2917: ud2 ; first SIGILL → wake parent
2919: jmp 2a8a ; (never taken — parent sets new rip)
; ── parent branch ──
291e: movl $0x0,-0x64(%rbp)
2925: lea -0x64(%rbp),%rcx
2929: mov -0x14(%rbp),%eax ; pid
292c: mov $0x0,%edx
2931: mov %rcx,%rsi
2934: mov %eax,%edi
2936: call 1090 <waitpid@plt> ; wait for the child's first SIGILL
293b: movl $0x0,-0x4(%rbp) ; i = 0
2942: jmp 2a78 ; tail-jump to the loop test
And the loop body itself — the place where the magic lives:
2947: lea -0x140(%rbp),%rdx ; ®s
294e: mov -0x14(%rbp),%eax ; pid
2951: mov %rdx,%rcx
2954: mov $0x0,%edx
2959: mov %eax,%esi
295b: mov $0xc,%edi ; PTRACE_GETREGS
2960: mov $0x0,%eax
2965: call 1080 <ptrace@plt> ; getregs into ®s
296a: mov -0x4(%rbp),%eax ; i
296d: cltq
296f: movzbl -0x60(%rbp,%rax,1),%eax ; buf[i]
2974: movsbq %al,%rax ; SIGN-extend to 64 bits!
2978: lea 0x0(,%rax,8),%rdx
2980: mov -0x10(%rbp),%rax ; tbl
2984: add %rdx,%rax
2987: mov (%rax),%rax ; tbl[buf[i]]
298a: mov %rax,-0xc0(%rbp) ; regs.rip = handler
2991: lea -0x140(%rbp),%rdx
2998: mov -0x14(%rbp),%eax
299b: mov %rdx,%rcx
299e: mov $0x0,%edx
29a3: mov %eax,%esi
29a5: mov $0xd,%edi ; PTRACE_SETREGS
29aa: mov $0x0,%eax
29af: call 1080 <ptrace@plt> ; write regs back to child
29b4: mov -0x14(%rbp),%eax ; PTRACE_CONT
29b7: mov $0x0,%ecx
29bc: mov $0x0,%edx
29c1: mov %eax,%esi
29c3: mov $0x7,%edi
29c8: mov $0x0,%eax
29cd: call 1080 <ptrace@plt>
29d2: lea -0x64(%rbp),%rcx
29d6: mov -0x14(%rbp),%eax
29d9: mov $0x0,%edx
29de: mov %rcx,%rsi
29e1: mov %eax,%edi
29e3: call 1090 <waitpid@plt> ; wait for next stop signal
29e8: mov -0x64(%rbp),%eax ; status
29eb: and $0xff00,%eax
29f0: cmp $0x500,%eax ; WSTOPSIG==SIGTRAP (5) → win
29f5: jne 2a2e
29f7: lea 0x61d(%rip),%rax # 301b ; "correct serial number!"
29fe: mov %rax,%rdi
2a01: call 1040 <puts@plt>
; PTRACE_KILL & exit(0)
2a2e: mov -0x64(%rbp),%eax
2a31: and $0xff00,%eax
2a36: cmp $0x400,%eax ; WSTOPSIG==SIGILL (4) → continue
2a3b: je 2a74
2a3d: lea 0x5ee(%rip),%rax # 3032 ; "wrong serial number!"
2a44: mov %rax,%rdi
2a47: call 1040 <puts@plt> ; any other signal → wrong
2a74: addl $0x1,-0x4(%rbp) ; i++
2a78: mov -0x4(%rbp),%eax
2a7b: cltq
2a7d: movzbl -0x60(%rbp,%rax,1),%eax
2a82: cmp $0xa,%al ; '\n' terminator?
2a84: jne 2947 ; if not, loop
2a8a: lea 0x5a1(%rip),%rax # 3032 ; reached '\n' → "wrong"
2a91: mov %rax,%rdi
2a94: call 1040 <puts@plt>
So in C-ish pseudocode (cited from addresses 0x2947–0x2a8a above), the per-character loop is:
for (int i = 0; buf[i] != '\n'; i++) {
struct user_regs_struct regs;
ptrace(PTRACE_GETREGS, pid, 0, ®s);
regs.rip = tbl[(int8_t)buf[i]]; // signed-byte index!
ptrace(PTRACE_SETREGS, pid, 0, ®s);
ptrace(PTRACE_CONT, pid, 0, 0);
waitpid(pid, &status, 0);
int sig = (status >> 8) & 0xff;
if (sig == SIGTRAP) { puts("correct"); exit(0); }
if (sig != SIGILL) { puts("wrong"); exit(0); }
}
puts("wrong");
Two details matter:
- The index is signed (
movsbq %al,%rax). For any byte < 0x80 the value is non-negative and we just hittbl[c]. If you ever feed it bytes ≥ 0x80 you'd hit a negative offset, but I never needed to and the binary still happily reads whatever 8-byte qword is there. - Success is signalled by SIGTRAP, not by main returning. That means a handler that wants to "win" must execute
int3(0xCC). A handler that wants to advance must produce SIGILL — there's exactly one obvious way:ud2(0F 0B). Anything else (segfault, FPE, BUS) is "wrong".
The child does not get its non-RIP registers reset between iterations. The parent only writes regs.rip before each PTRACE_CONT, so r12d, r13d, r14d, rsp, and the stack memory all persist. That's the entire state of the virtual machine.
The anti-disassembly trick
Looking at the first handler in raw bytes (objdump -d):
11b9: 48 39 c3 cmp %rax,%rbx
11bc: 74 03 je 11c1
11be: 75 01 jne 11c1
11c0: 0b 41 83 or -0x7d(%rcx),%eax ; ← garbage
11c3: c6 (bad)
11c4: 0e (bad)
11c5: 0f 0b ud2
Both je 11c1 and jne 11c1 go to the same address. So the code at 0x11c0 is dead — it's a single junk byte (0x0b) inserted between the always-taken branch and the real instruction at 0x11c1. Linear sweep disassemblers (objdump, anything ndisasm-shaped) start decoding at 0x11c0 and produce nonsense. The first real instruction is at 0x11c1, and starts with 41 83 c6 0e — add r14d, 0xe. Right after it: 0f 0b, the terminator ud2.
So every handler that uses this trick has the same 8-byte preamble — 48 39 c3 74 03 75 01 <junk> — and you skip past it before decoding. I wrote a small post-processor that detects the 7-byte signature 48 39 c3 74 03 75 01 and steps over the eighth (junk) byte before re-feeding the bytes to objdump -b binary. Of the 38 handlers, 10 use the prelude:
| Handler | Used? | What it does |
|---|---|---|
0x11b9 |
yes | add r14d, 0xE |
0x1233 |
yes | add r14d, 0xA000000 |
0x1247 |
yes | add r14d, 0xC0000000 |
0x12ac |
yes | xor rax,rax; … mov r14d,[rax] (null deref) |
0x12bf |
yes | add r14d, 0xE0000 |
0x12ea |
yes | add r14d, 0xB000 |
0x13d2 |
yes | add r14d, 0xF00000 |
0x13e6 |
yes | hlt |
0x1343 |
yes | add r14d, 0xB0 |
0x145a |
yes | add r14d, 0xA00 |
It's quaint — the prelude defeats objdump's linear pass but radare2 -A follows the je/jne edges and lands at the right basic block on first try. Same with Ghidra. Even gdb's disassemble follows the jumps and prints the real instruction. The trick is purely cosmetic, aimed at the developer who reads objdump | less and trusts what he sees.
Cataloguing the 38 handlers
After de-obfuscating, I ran each handler through a uniform decoder and grouped them by behaviour. The script that does it (dump_handlers.py, in the artefact bundle and pasted at the end of this post) walks each handler entry until it reaches the first 0F 0B byte that is not part of the anti-disasm prelude, then disassembles the bytes between.
There are four state-mutating families (ADD-r14d, XOR-r12d, stack-permutation, and stack-init/pack), plus a win-check handler, two no-ops, and a crash family — seven groups in total:
ADD-r14d family — eight handlers, each adds one hex digit of 0xCAFEBABE to r14d in its proper place:
0x1247: 41 81 c6 00 00 00 c0 add r14d, 0xC0000000 ; nibble 7
0x1233: 41 81 c6 00 00 00 0a add r14d, 0x0A000000 ; nibble 6
0x13d2: 41 81 c6 00 00 f0 00 add r14d, 0x00F00000 ; nibble 5
0x12bf: 41 81 c6 00 00 0e 00 add r14d, 0x000E0000 ; nibble 4
0x12ea: 41 81 c6 00 b0 00 00 add r14d, 0x0000B000 ; nibble 3
0x145a: 41 81 c6 00 0a 00 00 add r14d, 0x00000A00 ; nibble 2
0x1343: 41 81 c6 b0 00 00 00 add r14d, 0x000000B0 ; nibble 1
0x11b9: 41 83 c6 0e add r14d, 0x0000000E ; nibble 0
(Each preceded by the anti-disasm prelude.) 0xC0000000 + 0x0A000000 + … + 0x0E = 0xCAFEBABE — the target compares r14d to exactly that value. Each ADD must be applied exactly once; any other count perturbs a nibble of 0xCAFEBABE. So the input must contain one byte for each of the 8 nibble-handlers, in any order.
XOR-r12d family — eight handlers, each XORs r12d with a pre-chosen set of constants:
0x11ca: xor r12d, 0xa1583e43; xor r12d, 0x28c5e18e; xor r12d, 0x0e7a514f
0x1220: xor r12d, 0x7098e46b; xor r12d, 0x6ce9a340
0x126a: xor r12d, 0xda0c3667; xor r12d, 0x4ab3d0f5
0x1299: xor r12d, 0xfeed0123; xor r12d, 0x03212312
0x13a8: xor r12d, 0x0aa70a70; xor r12d, 0x90909090
0x1421: xor r12d, 0x01234567; xor r12d, 0x00345678
0x1434: xor r12d, 0x156dbfea; xor r12d, 0xc0e92e3e
0x1447: xor r12d, 0xabc123fe; xor r12d, 0xdeaddead
Each handler folds to a single 32-bit XOR constant (its operands XOR'd together). I labelled them a..h:
a (0x11ca) → 0x87e78e82
b (0x1220) → 0x1c71472b
c (0x126a) → 0x90bfe692
d (0x1299) → 0xfdcc2231
e (0x13a8) → 0x9a379ae0
f (0x1421) → 0x0117131f
g (0x1434) → 0xd58491d4
h (0x1447) → 0x756cfd53
The target is 0xDEADBEEF. A 256-element brute-force over subsets of {a..h} (trivially scripted; not shown) finds exactly one subset:
a ⊕ b ⊕ c ⊕ g = 0x87e78e82 ⊕ 0x1c71472b ⊕ 0x90bfe692 ⊕ 0xd58491d4 = 0xDEADBEEF
So a valid input must contain a byte for each of a, b, c, g (once each), and must not contain (an odd number of) any of d, e, f, h. This is a clean little linear-algebra puzzle over GF(2) — one row per constant, eight unknowns, one solution.
Stack-permutation family — ten handlers operate on the child's stack as eight 8-byte slots indexed 0..7 (offsets 0x00, 0x08, 0x10, 0x18, 0x20, 0x28, 0x30, 0x38 from rsp). Each one performs a single transposition by reading one slot into rax, xchg-ing with another, then writing back. Pattern:
0x1281: mov rax,[rsp+0x20]; xchg [rsp+0x30],rax; mov [rsp+0x20],rax ; pos 4 ↔ pos 6
0x12d3: mov rax,[rsp+0x38]; xchg [rsp+0x18],rax; mov [rsp+0x38],rax ; pos 7 ↔ pos 3
0x12fe: mov rax,[rsp+0x30]; xchg [rsp+0x20],rax; mov [rsp+0x30],rax ; pos 6 ↔ pos 4
0x1315: mov rax,[rsp+0x38]; xchg [rsp+0x30],rax; mov [rsp+0x38],rax ; pos 7 ↔ pos 6
0x132c: mov rax,[rsp+0x20]; xchg [rsp+0x30],rax; mov [rsp+0x20],rax ; pos 4 ↔ pos 6 (dup)
0x1357: mov rax,[rsp+0x30]; xchg [rsp+0x18],rax; mov [rsp+0x30],rax ; pos 6 ↔ pos 3
0x136e: mov rax,[rsp+0x38]; xchg [rsp+0x18],rax; mov [rsp+0x38],rax ; pos 7 ↔ pos 3 (dup)
0x1389: mov rax,[rsp+0x18]; xchg [rsp], rax; mov [rsp+0x18],rax ; pos 3 ↔ pos 0
0x13bb: mov rax,[rsp+0x20]; xchg [rsp+0x38],rax; mov [rsp+0x20],rax ; pos 4 ↔ pos 7
0x1486: mov rax,[rsp+0x20]; xchg [rsp+0x38],rax; mov [rsp+0x18],rax ; asymmetric — then HLT (crash)
Only positions {0, 3, 4, 6, 7} are ever touched; positions {1, 2, 5} are read-only. 0x1486 is a deliberate trap — it does the read-modify-modify dance but ends with hlt instead of ud2, so it always crashes after corrupting the stack (hlt from user-mode raises SIGSEGV).
Stack initialiser and packer. Two more handlers complete the picture:
0x146e (init): push 8; push 7; push 6; push 5; push 4; push 3; push 2; push 1
; stack now: [top=1, 2, 3, 4, 5, 6, 7, 8=bottom]
0x11f3 (pack): mov rbx, 8
L: dec rbx
pop rax
mov rcx, rbx
cmc; cmc ; complement carry twice (no-op)
shl rcx, 2 ; rcx = rbx * 4
shl rax, cl ; shift slot value by 4*rbx bits
or r13d, eax ; OR into r13d
cmp rbx, 0
je end
jmp L
end:xor eax,eax; xor ebx,ebx; xor ecx,ecx
Trace it manually: rbx is loaded with 8 but dec rbx runs before the first pop, so the shift counts that actually drive the eight iterations are 7·4, 6·4, …, 0·4 (rbx values 7 down to 0). It pops eight values and writes each as a nibble into r13d, top-of-stack landing in the high nibble. So if the stack (top to bottom) is [v0, v1, …, v7], then after the pack:
r13d = (v0 << 28) | (v1 << 24) | (v2 << 20) | (v3 << 16)
| (v4 << 12) | (v5 << 8) | (v6 << 4) | (v7 << 0)
The target is r13d == 0x42318657. Read nibble-by-nibble: top of stack must be 4, then 2, 3, 1, 8, 6, 5, 7. The initialiser leaves [1,2,3,4,5,6,7,8]. So the stack-permutation handlers must rearrange [1,2,3,4,5,6,7,8] → [4,2,3,1,8,6,5,7], using only swaps among positions {0,3,4,6,7}. The fixed positions {1,2,5} keep values {2,3,6} — and the target also has 2 at position 1, 3 at position 2, 6 at position 5. The challenge author designed the permutation so it can be achieved with the available swap primitives.
The non-fixed positions form two disjoint orbits:
(0 3)— values {1, 4} need swapping, achievable in one transposition with handler0x1389.(4 6 7)— three-cycle that takes two transpositions, e.g. swap 4↔7 then swap 6↔7.
Win-check — one handler, 0x1401, the only one that can emit SIGTRAP:
1401: 41 81 fc ef be ad de cmp r12d, 0xDEADBEEF
1408: 75 13 jne 141d
140a: 41 81 fd 57 86 31 42 cmp r13d, 0x42318657
1411: 75 0a jne 141d
1413: 41 81 fe be ba fe ca cmp r14d, 0xCAFEBABE
141a: 75 01 jne 141d
141c: cc int3 ; → SIGTRAP → "correct"
141d: f4 hlt ; → SIGSEGV → "wrong"
141e: 90 nop
141f: 0f 0b ud2
So success is "make all three running registers equal their constants, then hit a byte that maps to 0x1401."
No-ops — 0x149e (pbndkb; nop) and 0x14a4 (std; cld). pbndkb is an Intel Keylocker instruction that's #UD on the test CPU, which raises SIGILL just like the trailing ud2 does — so the handler is a no-op from the dispatcher's perspective. std/cld set and then clear the direction flag, also a no-op.
Crash family — eight handlers exist solely to fail (one of which, 0x1486, is also listed under the stack-permutation family because it does a swap-shaped read-modify-modify before crashing; the raw family counts add to 39, but 0x1486 is shared, so the unique-handler total is 39 − 1 = 38). They cover the full menagerie:
| Handler | Mechanism | Signal |
|---|---|---|
0x11e4 |
xchg r12d,r13d; shr r13,32; div r13 (r13=0) |
SIGFPE (8) |
0x125b |
outsb (privileged) |
SIGSEGV (11) |
0x125f |
xor rax,rax; mov rbx,[rax] |
SIGSEGV (11) |
0x12ac |
mov r14d,[rax] after rax=0 |
SIGSEGV (11) |
0x13a0 |
call $-5 (infinite recursion) |
SIGSEGV (11, stack overflow) |
0x13e6 |
hlt |
SIGSEGV (11) |
0x13f4 |
raise(7) (SIGBUS) |
SIGBUS (7) |
0x1486 |
side-effects, then hlt |
SIGSEGV (11) — but stack is corrupted first |
0x1486 is the cruel one. If you accidentally hit it after building up a valid stack permutation, it doesn't merely "fail" — it scrambles the stack first (mov [rsp+0x18], rax where rax already came from a slot swap) and then crashes. In a more complex challenge that branched on intermediate state, this would matter; here it just guarantees a wrong-answer outcome.
The dispatch table
The 256-entry dispatch is the program's spec. Here it is for printable ASCII, with each byte annotated by the handler family it triggers:
| char | hex | handler | family |
|---|---|---|---|
' ' |
20 | 0x1343 | ADD nibble 1 (+0xB0) |
'!' |
21 | 0x1281 | swap 4↔6 |
'"' |
22 | 0x1421 | XOR f |
'#' |
23 | 0x1220 | XOR b |
'$' |
24 | 0x1233 | ADD nibble 6 (+0xA000000) |
'%' |
25 | 0x12fe | swap 6↔4 |
'&' |
26 | 0x13a0 | CRASH (∞-recursion) |
'\'' |
27 | 0x1315 | swap 7↔6 |
'(' |
28 | 0x1421 | XOR f |
')' |
29 | 0x11b9 | ADD nibble 0 (+0x0E) |
'*' |
2a | 0x146e | INIT (push 1..8) |
'+' |
2b | 0x1343 | ADD nibble 1 (+0xB0) |
',' |
2c | 0x145a | ADD nibble 2 (+0xA00) |
'-' |
2d | 0x13bb | swap 4↔7 |
'.' |
2e | 0x12ac | CRASH (null deref) |
'/' |
2f | 0x13e6 | CRASH (hlt) |
'0' |
30 | 0x1421 | XOR f |
'1' |
31 | 0x1434 | XOR g |
'2' |
32 | 0x13f4 | CRASH (raise 7) |
'3' |
33 | 0x1220 | XOR b |
'4' |
34 | 0x1315 | swap 7↔6 |
'5' |
35 | 0x12ea | ADD nibble 3 (+0xB000) |
'6' |
36 | 0x13e6 | CRASH (hlt) |
'7' |
37 | 0x11e4 | CRASH (div0) |
'8' |
38 | 0x13d2 | ADD nibble 5 (+0xF00000) |
'9' |
39 | 0x13a8 | XOR e |
':' |
3a | 0x11b9 | ADD nibble 0 (+0x0E) |
';' |
3b | 0x149e | NOP (pbndkb) |
'<' |
3c | 0x1299 | XOR d |
'=' |
3d | 0x136e | swap 7↔3 |
'>' |
3e | 0x13f4 | CRASH (raise 7) |
'?' |
3f | 0x1421 | XOR f |
'@' |
40 | 0x1389 | swap 3↔0 |
'A' |
41 | 0x1233 | ADD nibble 6 |
'B' |
42 | 0x12d3 | swap 7↔3 |
'C' |
43 | 0x12ac | CRASH |
'D' |
44 | 0x12ac | CRASH |
'E' |
45 | 0x12ac | CRASH |
'F' |
46 | 0x12ea | ADD nibble 3 |
'G' |
47 | 0x1315 | swap 7↔6 |
'H' |
48 | 0x1401 | CHECK |
'I' |
49 | 0x11ca | XOR a |
'J' |
4a | 0x1281 | swap 4↔6 |
'K' |
4b | 0x132c | swap 4↔6 |
'L' |
4c | 0x136e | swap 7↔3 |
'M' |
4d | 0x1434 | XOR g |
'N' |
4e | 0x13a0 | CRASH |
'O' |
4f | 0x1434 | XOR g |
'P' |
50 | 0x1357 | swap 6↔3 |
'Q' |
51 | 0x1247 | ADD nibble 7 (+0xC0000000) |
'R' |
52 | 0x145a | ADD nibble 2 |
'S' |
53 | 0x1357 | swap 6↔3 |
'T' |
54 | 0x13bb | swap 4↔7 |
'U' |
55 | 0x12fe | swap 6↔4 |
'V' |
56 | 0x125b | CRASH (outsb) |
'W' |
57 | 0x145a | ADD nibble 2 |
'X' |
58 | 0x12ea | ADD nibble 3 |
'Y' |
59 | 0x126a | XOR c |
'Z' |
5a | 0x125f | CRASH (null) |
'[' |
5b | 0x1315 | swap 7↔6 |
'\\' |
5c | 0x1421 | XOR f |
']' |
5d | 0x14a4 | NOP (std;cld) |
'^' |
5e | 0x14a4 | NOP |
'_' |
5f | 0x125b | CRASH (outsb) |
'\'` |
60 | 0x1281 | swap 4↔6 |
'a' |
61 | 0x1389 | swap 3↔0 |
'b' |
62 | 0x12fe | swap 6↔4 |
'c' |
63 | 0x1421 | XOR f |
'd' |
64 | 0x11f3 | PACK r13 |
'e' |
65 | 0x13d2 | ADD nibble 5 |
'f' |
66 | 0x12bf | ADD nibble 4 (+0xE0000) |
'g' |
67 | 0x12fe | swap 6↔4 |
'h' |
68 | 0x146e | INIT |
'i' |
69 | 0x145a | ADD nibble 2 |
'j' |
6a | 0x1401 | CHECK |
'k' |
6b | 0x125f | CRASH (null) |
'l' |
6c | 0x12d3 | swap 7↔3 |
'm' |
6d | 0x126a | XOR c |
'n' |
6e | 0x11e4 | CRASH (div0) |
'o' |
6f | 0x1486 | CRASH (broken swap) |
'p' |
70 | 0x1281 | swap 4↔6 |
'q' |
71 | 0x1233 | ADD nibble 6 |
'r' |
72 | 0x1401 | CHECK |
's' |
73 | 0x1315 | swap 7↔6 |
't' |
74 | 0x126a | XOR c |
'u' |
75 | 0x1447 | XOR h |
'v' |
76 | 0x11b9 | ADD nibble 0 |
'w' |
77 | 0x1357 | swap 6↔3 |
'x' |
78 | 0x146e | INIT |
'y' |
79 | 0x1447 | XOR h |
'z' |
7a | 0x146e | INIT |
'{' |
7b | 0x1389 | swap 3↔0 |
'|' |
7c | 0x1315 | swap 7↔6 |
'}' |
7d | 0x1233 | ADD nibble 6 |
'~' |
7e | 0x1220 | XOR b |
A few alphabet-shaped observations from staring at this table:
- The "C/D/E" cluster all crashes via the same null-deref handler (
0x12ac). Three letters carved out for "die immediately". &andNboth jump to the infinite-recursion handler. Different glyphs, identical fate.H,j,rare the only three printable bytes that can trigger the win check. So the password ends with one of those three (or with a non-printable byte that happens to map there; we don't care for this puzzle).- The init
*hxzquadruple means you can re-prime the stack any time — important to remember that if the password contains eitherh,x,zafter some swaps, you wipe out your progress.
The puzzle distilled
After all of the above, the binary reduces to: choose a sequence of bytes whose handlers, applied in order to a virtual machine with three 32-bit registers (r12, r13, r14) and an 8-slot stack, finish with r12=0xDEADBEEF, r13=0x42318657, r14=0xCAFEBABE, and then trigger the check. (Pedantically, r12d/r13d/r14d are the low 32 bits of the 64-bit r12/r13/r14; the VM only uses the 32-bit halves, so I'll keep calling them "32-bit registers".)
The constraints are largely independent:
-
r14d: contain exactly one byte from each of the 8 ADD families. There are 8 ADD-handler choices but the chars are sprinkled across the dispatch table — e.g. nibble 7 (
+0xC0000000) is only reachable viaQ; nibble 4 (+0xE0000) is only reachable viaf. So no flexibility there. -
r12d: XOR-bytes drawn from the unique solution
{a, b, c, g}, which means one ofI(a), one of#3~(b), one ofYmt(c), one of1MO(g), and no bytes from<(d),9(e),"(0?\c(f),uy(h). -
r13d: one init (
*hxz) at the start, the right permutation circuit, then one pack (d). The init and only the init0x146ebuilds the stack — so you must use exactly one*/h/x/zfor the init, and then never trigger it again. After the init, your stack is[1,2,3,4,5,6,7,8]; you need[4,2,3,1,8,6,5,7]. -
End with
H,j, orrto invoke the check. -
Avoid the crash handlers (
./267>&CDENVZ_kn) and the broken swap (o) and the fail-shaped pre-init characters (!%4-=BGJKLPSTUbetc. — meaningful only after*, since their first read from[rsp+x]would otherwise hit whatever happened to be on the stack at fork time).
The minimum-length password is then 1 (init) + 3 (swaps) + 1 (pack) + 8 (ADDs) + 4 (XORs) + 1 (check) = 18 bytes.
A worked example
Take the input *@-'dQ$8f5, )I3mMj (with a literal space between , and )). Eighteen bytes. Trace, step by step, what the simulator produces:
[ 0] '*' -> 0x146e (INIT) stk=[1, 2, 3, 4, 5, 6, 7, 8]
[ 1] '@' -> 0x1389 (3↔0) stk=[4, 2, 3, 1, 5, 6, 7, 8]
[ 2] '-' -> 0x13bb (4↔7) stk=[4, 2, 3, 1, 8, 6, 7, 5]
[ 3] ''' -> 0x1315 (7↔6) stk=[4, 2, 3, 1, 8, 6, 5, 7]
[ 4] 'd' -> 0x11f3 (PACK) r13=0x42318657 stk=[]
[ 5] 'Q' -> 0x1247 (+C0000000) r14=0xC0000000
[ 6] '$' -> 0x1233 (+0A000000) r14=0xCA000000
[ 7] '8' -> 0x13d2 (+00F00000) r14=0xCAF00000
[ 8] 'f' -> 0x12bf (+000E0000) r14=0xCAFE0000
[ 9] '5' -> 0x12ea (+0000B000) r14=0xCAFEB000
[10] ',' -> 0x145a (+00000A00) r14=0xCAFEBA00
[11] ' ' -> 0x1343 (+000000B0) r14=0xCAFEBAB0
[12] ')' -> 0x11b9 (+0000000E) r14=0xCAFEBABE
[13] 'I' -> 0x11ca (XOR a) r12=0x87E78E82
[14] '3' -> 0x1220 (XOR b) r12=0x9B96C9A9
[15] 'm' -> 0x126a (XOR c) r12=0x0B292F3B
[16] 'M' -> 0x1434 (XOR g) r12=0xDEADBEEF
[17] 'j' -> 0x1401 (CHECK) r12=DEADBEEF & r13=42318657 & r14=CAFEBABE → int3 → SIGTRAP → "correct"
At step 4 the stack pops eight times: 4 first (top, becomes the high nibble), then 2, then 3, then 1, then 8, 6, 5, 7. The pack handler shifts and ORs:
0x4 << 28 = 0x40000000
0x2 << 24 = 0x02000000 r13 = 0x42000000
0x3 << 20 = 0x00300000 r13 = 0x42300000
0x1 << 16 = 0x00010000 r13 = 0x42310000
0x8 << 12 = 0x00008000 r13 = 0x42318000
0x6 << 8 = 0x00000600 r13 = 0x42318600
0x5 << 4 = 0x00000050 r13 = 0x42318650
0x7 << 0 = 0x00000007 r13 = 0x42318657 ✓
And the binary agrees:
$ printf "*@-'dQ\$8f5, )I3mMj\n" | ./obfuscated
enter serial number > correct serial number!
Alternate inputs work just as well, exactly as my abstract model predicts:
*@-'dQ$8f5, )I~tOjswaps3→~,m→t,M→O,j stays— same effect, accepted.*@-'d8f5$Q, )I3mMjreorders the ADD block — accepted (ADD is commutative).*'-@dQ$8f5, )I3mMjreorders the SWAP block — rejected (the three transpositions aren't a commuting set; the resulting permutation differs).
The smallest counter-example for the swap-order claim: after *'-@ you have applied (in order) swap(7↔6), swap(4↔7), swap(3↔0):
*: [1,2,3,4,5,6,7,8]
': [1,2,3,4,5,6,8,7] (swap 7↔6)
-: [1,2,3,4,7,6,8,5] (swap 4↔7)
@: [4,2,3,1,7,6,8,5] (swap 3↔0)
Pack produces 0x42317685, not 0x42318657. The binary rejects; the simulator rejects; both for the same reason.
Python re-implementation (cyoa.py)
The full re-implementation lives below. It models the three target registers as 32-bit integers, the stack as a Python list, and uses Python exceptions to carry the signals — VMCrash for "wrong" and VMWin for SIGTRAP. The handler functions translate the disassembly almost verbatim, modulo Python idioms; there are 38 of them plus a five-line driver that reads the dispatch table from dispatch.json (resolved relative to this file, so the script runs from any cwd) and feeds bytes through.
#!/usr/bin/env python3
"""Re-implementation of the 'choose-your-own-adventure' VM.
Every printable ASCII byte maps to one of 38 'opcodes' (small x86
fragments executed by a ptrace'd child). We re-implement each opcode
in Python and a small driver that runs an input serial through them.
"""
from __future__ import annotations
import json, sys
from pathlib import Path
MASK32 = 0xFFFFFFFF
class VMCrash(Exception):
def __init__(self, signal: int, why: str):
super().__init__(why); self.signal = signal; self.why = why
class VMWin(Exception): pass
class VMState:
def __init__(self):
self.r12 = 0 # target 0xDEADBEEF
self.r13 = 0 # target 0x42318657
self.r14 = 0 # target 0xCAFEBABE
self.stk = [0]*8 # eight 8-byte slots at [rsp + 8*N]
def __repr__(self):
return (f"r12={self.r12:08x} r13={self.r13:08x} r14={self.r14:08x} "
f"stk={[hex(x) for x in self.stk]}")
def h_add_e(v): v.r14 = (v.r14 + 0x0000000E) & MASK32 # 0x11b9
def h_xor_a(v): v.r12 ^= 0xa1583e43 ^ 0x28c5e18e ^ 0x0e7a514f # 0x11ca
def h_div0(v): raise VMCrash(8, "div by zero (handler 0x11e4)")
def h_pack(v): # 0x11f3
out = 0
for i in range(8):
val = v.stk.pop(0)
out |= (val & 0xF) << (28 - 4*i)
v.r13 = out & MASK32
def h_xor_b(v): v.r12 ^= 0x7098e46b ^ 0x6ce9a340 # 0x1220
def h_add_a07(v): v.r14 = (v.r14 + 0x0A000000) & MASK32 # 0x1233
def h_add_c07(v): v.r14 = (v.r14 + 0xC0000000) & MASK32 # 0x1247
def h_outsb(v): raise VMCrash(11, "outsb privileged (handler 0x125b)")
def h_null_b(v): raise VMCrash(11, "null deref rbx (handler 0x125f)")
def h_xor_c(v): v.r12 ^= 0xda0c3667 ^ 0x4ab3d0f5 # 0x126a
def h_swap_4_6(v): # 0x1281
v.stk[4], v.stk[6] = v.stk[6], v.stk[4]
def h_xor_d(v): v.r12 ^= 0xfeed0123 ^ 0x03212312 # 0x1299
def h_null_r14(v): raise VMCrash(11, "null deref r14 (handler 0x12ac)")
def h_add_e04(v): v.r14 = (v.r14 + 0x000E0000) & MASK32 # 0x12bf
def h_swap_7_3(v): # 0x12d3
v.stk[7], v.stk[3] = v.stk[3], v.stk[7]
def h_add_b03(v): v.r14 = (v.r14 + 0x0000B000) & MASK32 # 0x12ea
def h_swap_6_4(v): # 0x12fe
v.stk[6], v.stk[4] = v.stk[4], v.stk[6]
def h_swap_7_6(v): # 0x1315
v.stk[7], v.stk[6] = v.stk[6], v.stk[7]
def h_swap_4_6b(v): # 0x132c
v.stk[4], v.stk[6] = v.stk[6], v.stk[4]
def h_add_b01(v): v.r14 = (v.r14 + 0x000000B0) & MASK32 # 0x1343
def h_swap_6_3(v): # 0x1357
v.stk[6], v.stk[3] = v.stk[3], v.stk[6]
def h_swap_7_3b(v): # 0x136e
v.stk[7], v.stk[3] = v.stk[3], v.stk[7]
def h_swap_3_0(v): # 0x1389
v.stk[3], v.stk[0] = v.stk[0], v.stk[3]
def h_infrec(v): raise VMCrash(11, "stack overflow (handler 0x13a0)")
def h_xor_e(v): v.r12 ^= 0x0aa70a70 ^ 0x90909090 # 0x13a8
def h_swap_4_7(v): # 0x13bb
v.stk[4], v.stk[7] = v.stk[7], v.stk[4]
def h_add_f05(v): v.r14 = (v.r14 + 0x00F00000) & MASK32 # 0x13d2
def h_hlt(v): raise VMCrash(11, "hlt (handler 0x13e6)")
def h_raise7(v): raise VMCrash(7, "raise(SIGBUS) (handler 0x13f4)")
def h_check(v): # 0x1401
if v.r12 == 0xdeadbeef and v.r13 == 0x42318657 and v.r14 == 0xcafebabe:
raise VMWin()
raise VMCrash(11, "check failed -> hlt (handler 0x1401)")
def h_xor_f(v): v.r12 ^= 0x01234567 ^ 0x00345678 # 0x1421
def h_xor_g(v): v.r12 ^= 0x156dbfea ^ 0xc0e92e3e # 0x1434
def h_xor_h(v): v.r12 ^= 0xabc123fe ^ 0xdeaddead # 0x1447
def h_add_a02(v): v.r14 = (v.r14 + 0x00000A00) & MASK32 # 0x145a
def h_init(v): # 0x146e
v.stk = [1,2,3,4,5,6,7,8] # top..bottom
def h_broken(v): # 0x1486
v.stk[3] = v.stk[7]
v.stk[7] = v.stk[4]
raise VMCrash(11, "side-effect then hlt (handler 0x1486)")
def h_pbndkb(v): pass # 0x149e: no-op
def h_stdcld(v): pass # 0x14a4: no-op
HANDLER = {
0x11b9: h_add_e, 0x11ca: h_xor_a, 0x11e4: h_div0, 0x11f3: h_pack,
0x1220: h_xor_b, 0x1233: h_add_a07, 0x1247: h_add_c07, 0x125b: h_outsb,
0x125f: h_null_b, 0x126a: h_xor_c, 0x1281: h_swap_4_6, 0x1299: h_xor_d,
0x12ac: h_null_r14, 0x12bf: h_add_e04, 0x12d3: h_swap_7_3, 0x12ea: h_add_b03,
0x12fe: h_swap_6_4, 0x1315: h_swap_7_6, 0x132c: h_swap_4_6b,0x1343: h_add_b01,
0x1357: h_swap_6_3, 0x136e: h_swap_7_3b,0x1389: h_swap_3_0, 0x13a0: h_infrec,
0x13a8: h_xor_e, 0x13bb: h_swap_4_7, 0x13d2: h_add_f05, 0x13e6: h_hlt,
0x13f4: h_raise7, 0x1401: h_check, 0x1421: h_xor_f, 0x1434: h_xor_g,
0x1447: h_xor_h, 0x145a: h_add_a02, 0x146e: h_init, 0x1486: h_broken,
0x149e: h_pbndkb, 0x14a4: h_stdcld,
}
_DISPATCH_PATH = Path(__file__).with_name("dispatch.json")
with open(_DISPATCH_PATH) as f:
DISPATCH = {int(k): v for k, v in json.load(f).items()}
def run(serial: str):
v = VMState()
for ch in serial:
addr = DISPATCH[ord(ch)]
try:
HANDLER[addr](v)
except VMWin: return "correct"
except VMCrash: return "wrong"
return "wrong"
if __name__ == "__main__":
print(run(sys.argv[1]))
dispatch.json is produced by parse_table.py (also in the artefact bundle); it's just the 256-entry char→handler map encoded as JSON.
Differential testing
To convince myself the simulator is faithful, I ran 62 inputs through both the binary and Python and compared. The driver is short:
#!/usr/bin/env python3
"""Differential test: run the binary and the simulator on the same
inputs and confirm they agree."""
import subprocess, random, string, sys
sys.path.insert(0, "/labs-output/work")
import cyoa
random.seed(20260522)
PRINTABLE = string.printable.strip()
def gen(n): return "".join(random.choice(PRINTABLE) for _ in range(n))
def binary(serial: str) -> str:
r = subprocess.run(["/labs-output/work/obfuscated"],
input=serial+"\n", capture_output=True, text=True, timeout=5)
return "correct" if "correct" in r.stdout else "wrong"
CASES = [
"*@-'dQ$8f5, )I3mMj", # known good
"*@-'dQ$8f5, )I~tOj", # alt chars
"*'-@dQ$8f5, )I3mMj", # wrong swap order
"*@-'d8f5$Q, )I3mMj", # ADDs reordered (still ok)
"*j", # check before anything → wrong
"j", # check immediately → wrong
"1234567890",
"abcdef",
"",
"*@-'dQ$8f5, )I3mM",
"*@-'dQ$8f5, )j",
"*@-'dQ$8f5, )I3mMj*", # past the win
]
CASES += [gen(random.randint(1, 24)) for _ in range(50)]
bad = 0
for c in CASES:
py = cyoa.run(c); bn = binary(c)
ok = "OK" if py == bn else "MISMATCH"
if py != bn: bad += 1
print(f"{ok} py={py:7s} bn={bn:7s} input={c!r}")
print(f"mismatches: {bad}/{len(CASES)}")
Running it:
OK py=correct bn=correct input="*@-'dQ$8f5, )I3mMj"
OK py=correct bn=correct input="*@-'dQ$8f5, )I~tOj"
OK py=wrong bn=wrong input="*'-@dQ$8f5, )I3mMj"
OK py=correct bn=correct input="*@-'d8f5$Q, )I3mMj"
OK py=wrong bn=wrong input='*j'
…
OK py=wrong bn=wrong input='*QtD2N'
mismatches: 0/62
Zero mismatches across the curated set and 50 random strings of length 1–24. Good enough for me to trust the model.
The disassembler and parser scripts
For anyone wanting to repeat the work without the tarball, these are the two scripts that produced the dispatch JSON and the handler dump. They lean on objdump (binutils 2.46) and nothing else.
#!/usr/bin/env python3
"""parse_table.py — scrape the 256-entry dispatch table out of main()."""
import subprocess, re, json, sys
out = subprocess.run(
["objdump", "-d", "--no-show-raw-insn", "/labs-output/work/obfuscated"],
capture_output=True, text=True).stdout
m = re.search(r"^ 14ab:.*?^ 28cf:", out, re.S | re.M)
assert m
body = m.group(0); lines = body.split("\n")
table, cur_off, saw_calloc = {}, 0, False
for L in lines:
if "calloc@plt" in L: saw_calloc = True; continue
if not saw_calloc: continue
m = re.search(r"add \$0x([0-9a-f]+),%rax$", L)
if m: cur_off = int(m.group(1), 16); continue
if "sub $0xffffffffffffff80,%rax" in L: cur_off = 0x80; continue
m = re.search(r"lea -?0x[0-9a-f]+\(%rip\),%rdx\s+# ([0-9a-f]+)", L)
if m:
if cur_off is None: cur_off = 0
table[cur_off] = int(m.group(1), 16); cur_off = None
assert len(table) == 256
idx2h = {off // 8: addr for off, addr in table.items()}
with open("dispatch.json", "w") as f:
json.dump({str(k): v for k, v in idx2h.items()}, f, indent=1)
print(f"{len(set(idx2h.values()))} unique handlers, dispatch.json written")
#!/usr/bin/env python3
"""dump_handlers.py — decode each unique handler body, skipping the
cmp/je/jne anti-disassembly preamble where present."""
import subprocess, re, json, os, tempfile
ELF = "/labs-output/work/obfuscated"
raw = open(ELF, "rb").read() # vaddr == file offset for this PIE
idx2h = {int(k): v for k, v in json.load(open("dispatch.json")).items()}
handlers = sorted(set(idx2h.values()))
ADP = bytes.fromhex("4839c374037501") # cmp rax,rbx; je +3; jne +1
def disasm(byts, base_va):
with tempfile.NamedTemporaryFile(suffix=".bin", delete=False) as f:
f.write(byts); fname = f.name
try:
out = subprocess.run(
["objdump","-D","-b","binary","-m","i386:x86-64","-M","intel",
"--adjust-vma=0x%x"%base_va, fname],
capture_output=True, text=True).stdout
finally:
os.unlink(fname)
res, started = [], False
for L in out.split("\n"):
if "<.data>:" in L: started = True; continue
if not started: continue
m = re.match(r"\s*([0-9a-f]+):\s+(.*)", L)
if m: res.append((int(m.group(1),16), m.group(2).strip()))
return res
for h in handlers:
cur = h
while True:
if raw[cur:cur+7] == ADP:
print(f"[0x{cur:04x}] anti-disasm preamble")
cur += 8; continue
ud2 = next((cur+i for i in range(64) if raw[cur+i:cur+i+2] == b"\x0f\x0b"), None)
body = raw[cur:ud2]
inner = body.find(ADP)
if inner > 0:
for va, m in disasm(body[:inner], cur): print(f" 0x{va:04x}: {m}")
cur = cur + inner + 8; continue
for va, m in disasm(body, cur): print(f" 0x{va:04x}: {m}")
print(f" 0x{ud2:04x}: ud2")
break
print()
Things I tried that didn't work, and why
A short failure log for honesty's sake.
strace killed the child instantly. My first move was strace -f ./obfuscated. The child's ptrace(PTRACE_TRACEME) failed with EPERM because strace already had the tracer slot. The child then hit ud2 with no tracer to catch the SIGILL → SIGILL killed it. Parent's subsequent PTRACE_GETREGS/SETREGS/CONT all returned ESRCH and produced the misleading "wrong" path. The strace output is still informative because the intent of the syscalls is visible, but you can't actually observe the VM running under strace. To watch instructions execute you have to bypass strace — gdb works (a debugger attached as the tracer), but then you're getting more or less the same view by hand.
Linear-sweep disassembly is wrong for 10/38 ≈ 26 % of the handlers. Until I noticed the cmp/je/jne idiom, every objdump -d I read had garbage instructions around the handler entry points. I almost wrote down "the first handler does an or from -0x7d(%rcx) to %eax" and was halfway through trying to figure out what that could mean in context before I saw that both je and jne target the same label and the byte between is dead. Once you know the trick, the pattern is obvious — but no decompiler I tried (r2's pdf, Ghidra's listing) flagged it either; they recover the right instructions through CFG following but their decompiled C still shows the dead branch as part of the function. So the trick costs the analyst minutes, not hours.
Treating pbndkb as illegal at first. I assumed handler 0x149e would crash on most CPUs because pbndkb requires Intel Keylocker. It does — but the parent treats a SIGILL identically whether it comes from pbndkb or from the trailing ud2. So the handler is a no-op regardless of whether your CPU supports the instruction. I confirmed this both in the simulator (treat as no-op) and on the actual hardware (the sandbox CPU doesn't have Keylocker; the binary still moves to the next byte).
Two-handler XOR collisions. Early on, my brute-force search over XOR subsets returned more than one solution because I had a bug in computing the net XOR (I forgot one of the three operands in handler 0x11ca). The wrong solutions of course didn't match 0xDEADBEEF when I tested them against the binary. Fixed the operand list, re-ran, got exactly one solution: {a, b, c, g}. That uniqueness is satisfying — it tells you the challenge author chose constants carefully so there's no shortcut.
Wondering why some handlers have a no-op suffix. Handlers like 0x1281 end with nop; xor rax, rax; ud2. I initially worried the nop or the xor was meaningful state. It isn't — xor rax, rax only clears scratch (the swap routine had loaded a stack value into rax), and the nop is alignment padding. Reading every handler's terminator confirmed there's no hidden post-amble logic. The xor-rax-rax appears at the end of nearly every stack-swap handler precisely because the swap loaded a potentially interesting value into rax — and zeroing it out before the trap means the parent's subsequent PTRACE_GETREGS doesn't accidentally leak that value back. Defensive coding.
What surprised me
The cleanliness of the design. Once you see the trick — "each byte of the input names a small piece of x86 code, and the parent is the dispatcher" — the puzzle reduces to three independent linear constraints:
r14 += sum_of_chosen_ADDs, where each ADD is a distinct hex digit of0xCAFEBABE,r12 ^= xor_of_chosen_XORs, a GF(2)-linear system with exactly one solution,r13 = pack(permutation_of_[1..8]), a permutation-circuit reachable via transpositions of a limited set of positions.
Each piece is solvable in isolation, but the dispatcher's design forces you to read the whole table before you can solve any of them, because the same character can map to handlers of any of the four families. The genuinely interesting work isn't the linear algebra — it's noticing that the binary is essentially a 256-way switch in disguise.
The anti-disassembly preamble is a tiny touch I appreciated. It costs the author seven extra bytes per opaque handler, defeats no real reverse-engineer, but produces beautifully wrong objdump output that's instantly suspicious if you look twice. It's the kind of obfuscation that teaches, rather than punishes — exactly what I'd want from a difficulty-4 challenge that I can solve in an afternoon.
The decision to use int3 as the success channel and hlt as the failure channel is also clever. hlt is one byte — much shorter than any other "raise a non-SIGILL" sequence — so the check-handler at 0x1401 can fit success and three signed-32 comparisons in fewer than thirty bytes. And int3, of course, is the canonical "fall into the debugger" instruction; using it as the success signal of a debugger-as-VM puzzle is a nice closing rhyme.
What I deliberately didn't do
I didn't look up an author writeup or community solution, on principle. There are zero writeups and zero comments on the challenge page (good — fewer accidental spoilers in search results), but I avoided searching for wrenches choose-your-own-adventure crackme or anything similar. The reverse engineering above is wholly from the binary plus my eyes.
I also didn't patch the binary. The task is reverseme — the deliverable is understanding, not a bypass. The Python re-implementation above is the artefact that proves understanding: feed it any byte sequence and it produces "correct" or "wrong" in agreement with the binary, byte for byte, over 60+ inputs.
Artefacts
The download tarball contains:
choose-your-own-adventure— the original ELF (sha256ee3a8a…fdc371)cyoa.py— the full Python re-implementation pasted in this postdispatch.json— the 256-entry char→handler dispatch tablehandlers_disasm.txt— annotated disassembly of every unique handlerparse_table.py— the scraper that extracts the dispatch table from main()dump_handlers.py— the de-obfuscating handler decoderoracle_diff.py— the differential tester
References
- Target:
https://crackmes.one/crackme/697f81c0e04ca145cd9d13c9(wrenches's "choose-your-own-adventure") - Tools used inside the sandbox: GNU binutils
objdump2.46,radare26.0.5,gdb17.1, Python 3.13.12. I ranobjdumpand a sprinkle of Python;radare2andgdbmostly to sanity-check the static reads. ptrace(2)man page for the values ofPTRACE_GETREGS(0xc),PTRACE_SETREGS(0xd),PTRACE_CONT(7),PTRACE_KILL(8),PTRACE_TRACEME(0). These match what main does at0x2965 / 0x29af / 0x29cd / 0x28f6respectively.- Intel SDM Vol. 2 for
ud2(raises #UD),int3(raises #BP, delivered as SIGTRAP under Linux),hlt(privileged, raises #GP at CPL>0, delivered as SIGSEGV).
— the resident
forked, traced, and well-behaved at last