`headache`: 209 layers of XOR onion guarding a flag that checks itself one byte-pair at a time
A stripped 26 KB ELF from AmateursCTF 2023 whose entire `.text` is a self-decrypting cascade — each tiny gadget peels one XOR layer off the rest of the code, checks a single relation `flag[a] ^ flag[b] == c`, and jumps into the next layer. There's no key to guess and no check to bypass in the classic sense: the binary *is* a 210-clause system of XOR equations, and recovering it means re-implementing the whole machine. This is that re-implementation, harvested with a Unicorn emulator and solved as a graph.
A stripped 26 KB ELF from AmateursCTF 2023 whose entire .text is a self-decrypting cascade — each tiny gadget peels one XOR layer off the rest of the code, checks a single relation flag[a] ^ flag[b] == c, and jumps into the next layer. There's no key to guess and no check to bypass in the classic sense: the binary is a 210-clause system of XOR equations, and recovering it means re-implementing the whole machine. This is that re-implementation, harvested with a Unicorn emulator and solved as a graph.
The target
The artefact is headache, the rev challenge by flocto and unvariant from AmateursCTF 2023 (1 solve at the time, flag format amateursCTF{[a-zA-Z0-9_]+}). I pulled it from the sajjadium CTF-archives mirror:
- repo:
https://github.com/sajjadium/ctf-archives - directory:
ctfs/AmateursCTF/2023/rev/headache/
$ file headache
headache: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically
linked, interpreter /lib64/ld-linux-x86-64.so.2, ... stripped
$ ls -l headache
-rwxr-xr-x 1 root root 26768 ... headache
$ sha256sum headache
74cd0734effa0914bf286e0d3d48cc6afe6b1b6b45903747dd30e270055941a8 headache
A few facts from the ELF header (xxd headache | head) matter for everything that follows:
00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............
00000010: 0200 3e00 0100 0000 9010 4000 0000 0000 ..>.......@.....
00000020: 4000 0000 0000 0000 9061 0000 0000 0000 @........a......
Byte 0x10 is 0x02 = ET_EXEC: this is a non-PIE binary, fixed load base 0x400000, entry 0x401090. That's a gift — every address in the program is absolute and stable, so I can talk about 0x4012a4 and mean exactly one byte in the file. Because the image is loaded contiguously from base 0x400000, the mapping file_offset = vaddr − 0x400000 holds throughout .text and .rodata, which lets me patch and emulate without juggling segment tables.
It links only libc (__isoc99_scanf, strlen, printf, puts, setbuf) — no crypto library, no ptrace, nothing fancy in the imports. All the cleverness is in the bytes.
First impressions
strings is where the "headache" begins. Past the usual libc symbol soup there are a couple of human strings — Enter flag:, Wrong!, Correct! — and then thousands of bytes of high-entropy garbage:
$ strings -n 5 headache | sed -n '20,40p'
H=Xp@
uOTbQ
?V\a6
^5mOV{
ac=sE
...
That garbage is not data. As we'll see, it's .text — encrypted code. A first strace confirms the boring outer behaviour: read a line, then think hard:
$ printf 'amateursCTF{...}\n' | strace -e read,write ./headache
write(1, "Enter flag: ", 12) = 12
read(0, "a", 1) = 1
read(0, "m", 1) = 1
read(0, "a", 1) = 1
...
So it reads the flag a byte at a time (that's scanf("%s") going through an unbuffered stdin — setbuf(stdin, 0) is called in main), then does all its work in-process with no further syscalls until it prints the verdict. Nothing to sniff on the wire; the whole story is internal. Initial hypothesis: a flag-checker that transforms the input and compares. The high-entropy .text says it's an obfuscated one. Let's open it.
Static pass: main and the first gadget
radare2 6.1.7 analyses main cleanly (it's plaintext). Annotated:
; main @ 0x00401176 (r2: aaa; s main; pdf)
0x004011d0 lea rax, str.Enter_flag: ; "Enter flag: "
0x004011df call sym.imp.printf
0x004011e4 lea rax, [s] ; stack buffer
0x004011ee lea rax, [0x00405011] ; "%s"
0x004011fd call sym.imp.__isoc99_scanf ; scanf("%s", buf)
0x0040120c call sym.imp.strlen
0x00401211 cmp rax, 0x3d ; len == 61 ?
0x00401215 je 0x40122d ; yes -> run the checker
0x00401217 lea rax, str.Wrong_ ; no -> "Wrong!"
...
0x0040122d lea rax, [s]
0x00401237 call fcn.00401290 ; <-- the checker, gets buf in rdi
0x0040123c mov dword [var_114h], eax
0x00401242 cmp dword [var_114h], 0
0x00401249 je 0x40125c ; ret == 0 -> "Wrong!"
0x0040124b lea rax, str.Correct_ ; ret != 0 -> "Correct!"
Two things to bank. First, the flag is exactly 61 characters (0x3d), which for amateursCTF{...} (12-byte prefix + }) leaves 48 bytes of payload. Second, the checker fcn.00401290 returns non-zero for success — note the inverted sense: je to "Wrong!" when the return is 0. Keep that in mind; it explains a wrinkle at the very end.
fcn.00401290 is tiny and very strange:
; fcn.00401290 (r2: s 0x00401290; pd 6)
0x00401290 mov r15b, byte [rdi + 0x19] ; r15b = flag[0x19]
0x00401294 xor r15b, byte [rdi] ; r15b ^= flag[0]
0x00401297 cmp r15b, 0x56 ; == 0x56 ?
0x0040129b je 0x404374 ; match -> jump WAY forward
0x004012a1 xor eax, eax ; mismatch -> return 0 (Wrong!)
0x004012a3 ret
This is the whole shape of the binary in miniature. A "gadget" loads two flag bytes, XORs them, compares the result to a constant, and on success jumps to 0x404374 — an address that readelf says is well inside .text:
$ readelf -S headache | grep -A1 '\.text'
[14] .text PROGBITS 0000000000401090 00001090
0000000000003304 ... AX
.text runs 0x401090 … 0x404394. So 0x404374 is code, and the "garbage strings" from earlier live in that same executable range. The mismatch path returns 0 → "Wrong!". The match path leaves main's frame entirely via a jmp-like je and continues somewhere near the top of .text. Time to follow it.
The self-decrypting cascade
0x404374 decompiles — from the raw file, it's plaintext — into a decryptor:
; decryptor #1 @ 0x404374 (r2: s 0x404374; pd 9)
0x00404374 mov eax, 0xea228de6 ; XOR key for this layer
0x00404379 lea rsi, [0x4012a4] ; start of region to decrypt
0x00404381 xor dword [rsi], eax ; loop: decrypt one dword in place
0x00404383 add rsi, 4
0x00404387 cmp dword [rsi - 4], eax ; stop when a decrypted dword == key
0x0040438a jne 0x404381 ; (the sentinel)
0x0040438c call 0x4012a4 ; call the freshly revealed gadget
0x00404391 ret
The mechanism: take a hard-coded 32-bit key, walk dwords from a start address, XOR each in place, and stop at the first dword that decrypts to the key value itself (a sentinel). Then call the start address — the code you just unmasked — and ret afterward.
I replicated layer 1 statically to prove I understood it. decrypt.py reads the file, XOR-decrypts dwords from 0x4012a4 with 0xea228de6 until a decrypted dword equals the key, and patches the result back:
#!/usr/bin/env python3
# Statically replicate the runtime self-decryption in headache.
# Gadget @0x404374: mov eax,0xea228de6 ; lea rsi,[0x4012a4]
# loop: xor [rsi],eax ; rsi+=4 ; cmp [rsi-4],eax ; jne loop
import struct
KEY = 0xea228de6
data = bytearray(open('/labs-output/artifacts/headache', 'rb').read())
def off(vaddr): return vaddr - 0x400000 # ET_EXEC: file_off == vaddr-0x400000
start = 0x4012a4
plain = bytearray(); v = start
while True:
cipher = struct.unpack_from('<I', data, off(v))[0]
dec = cipher ^ KEY
plain += struct.pack('<I', dec)
if dec == KEY: break
v += 4
if v - start > 0x4000: print("runaway"); break
print(f"decrypted region: 0x{start:x} .. 0x{v+4:x} ({len(plain)} bytes)")
patched = bytearray(data)
patched[off(start):off(start)+len(plain)] = plain
open('/labs-output/artifacts/headache.dec', 'wb').write(patched)
print("first 64 plaintext bytes:")
print(' '.join(f'{b:02x}' for b in plain[:64]))
$ python3 decrypt.py
decrypted region: 0x4012a4 .. 0x404374 (12496 bytes)
first 64 plaintext bytes:
44 8a 7f 2d 44 32 7f 0e 41 80 ff 1d 0f 84 9a 30 00 00 31 c0 c3 90 90 90
ac d3 c0 9c ac 6b c0 9f a9 d9 40 bb e7 dd e1 8e e8 ...
And here is the headache. The first 24 bytes decode into a perfect second gadget — mov r15b,[rdi+0x2d]; xor r15b,[rdi+0x0e]; cmp r15b,0x1d; je …; xor eax,eax; ret — but everything after that (ac d3 c0 9c …) is still garbage. A single bulk XOR with one key reveals exactly one gadget and leaves the rest encrypted.
That's the trick: the gadgets are an onion. The blob 0x4012a4 … 0x404374 — the gadgets plus the still-encrypted decryptors #2…#209 — is encrypted layer-on-layer, the full 12496-byte region decrypt.py reports. Decryptor #1 peels one layer off the entire remaining region, which exposes gadget #1 in cleartext but leaves gadget #2 with one fewer layer, gadget #3 with two fewer, and so on. The disassembly of the just-revealed gadget #1 confirms it jumps to its own decryptor, which peels the next layer:
; gadget #1 @ 0x4012a4 (in headache.dec) (r2: s 0x4012a4; pd 6)
0x004012a4 mov r15b, byte [rdi + 0x2d] ; flag[0x2d]
0x004012a8 xor r15b, byte [rdi + 0x0e] ; ^ flag[0x0e]
0x004012ac cmp r15b, 0x1d
0x004012b0 je 0x404350 ; -> decryptor #2
0x004012b6 xor eax, eax
0x004012b8 ret
; decryptor #2 @ 0x404350 (plaintext) (r2: s 0x404350; pd 8)
0x00404350 mov eax, 0xbebf59e8 ; a DIFFERENT key
0x00404355 lea rsi, [0x4012bc] ; the NEXT gadget slot
0x0040435d xor dword [rsi], eax ; peel another layer
...
0x00404368 call 0x4012bc
So the control flow is a strict chain:
main → gadget0(0x401290) →je decryptor1(0x404374) →call gadget1(0x4012a4)
→je decryptor2(0x404350) →call gadget2(0x4012bc) →je … →call gadgetN
The decryptors live at the top of .text and march downward (stride 0x24 = 36 bytes: 0x404374, 0x404350, 0x40432c, …); the gadgets live at the bottom and march upward (stride 0x18 = 24 bytes: 0x4012a4, 0x4012bc, 0x4012d4, …). They grow toward each other and meet in the middle around 0x402634 / 0x40261c. Because each decryptor calls the next gadget and rets afterward, a fully-correct flag bottoms out at the innermost gadget and then unwinds through ~209 nested rets — which you can watch in an emulator as a wall of ret instructions at the end of the trace.
The crucial observation for re-implementation: the decryption depends only on the hard-coded keys, never on the flag. The flag only steers the cmp/je. So if I can force every branch "taken," the binary will obligingly decrypt and walk its entire gadget chain in front of me, regardless of what I type — and I can read off every constraint.
Emulating the chain with Unicorn
I drove the cascade under Unicorn 2.1.4. The harness maps the image at 0x400000, points rdi at a throwaway 64-byte buffer, plants a sentinel return address so the outer ret cleanly ends emulation, and installs one code hook that does two jobs: (1) recognise a gadget head mov r15b,[rdi+a]; xor r15b,[rdi+b]; cmp r15b,c and record (a,b,c), and (2) force every conditional branch by setting ZF so the chain never bails. The gadgets use both a near je (0f 84) and, near the end, a short je (74); both must be forced.
#!/usr/bin/env python3
# Emulate headache's self-decrypting check cascade with Unicorn.
# Decryption is INPUT-INDEPENDENT, so run with any buffer and FORCE every
# conditional branch taken to harvest the full list flag[a]^flag[b]==c.
import struct, json
from unicorn import *
from unicorn.x86_const import *
FILE = open('/labs-output/artifacts/headache', 'rb').read()
FLAGBUF, STACK, RETSENT = 0x500000, 0x600000, 0xdeadbeef0000
uc = Uc(UC_ARCH_X86, UC_MODE_64)
uc.mem_map(0x400000, 0x10000) # image; .text is self-modifying -> RWX
uc.mem_map(FLAGBUF & ~0xfff, 0x1000)
uc.mem_map(STACK & ~0xfff, 0x1000)
uc.mem_write(0x400000, FILE)
uc.mem_write(FLAGBUF, b'A'*64) # arbitrary; decryption ignores it
sp = STACK + 0x800 - 8
uc.mem_write(sp, struct.pack('<Q', RETSENT))
uc.reg_write(UC_X86_REG_RSP, sp)
uc.reg_write(UC_X86_REG_RDI, FLAGBUF)
constraints, seen = [], set()
def hook_code(uc, address, size, user):
code = uc.mem_read(address, max(size, 6))
if code[0:2] == b'\x44\x8a': # mov r15b, [rdi+a]
modrm = code[2]
if modrm == 0x7f: a = code[3]; xoff = 4
elif modrm == 0x3f: a = 0; xoff = 3 # [rdi] (a==0) special form
else: return
nxt = uc.mem_read(address + xoff, 6)
if nxt[0:2] != b'\x44\x32': return # xor r15b, [rdi+b]
modrm2 = nxt[2]
if modrm2 == 0x7f: b = nxt[3]; coff = xoff + 4
elif modrm2 == 0x3f: b = 0; coff = xoff + 3
else: return
cmpb = uc.mem_read(address + coff, 4)
if cmpb[0:3] == b'\x41\x80\xff': # cmp r15b, imm8
if address not in seen:
seen.add(address); constraints.append((a, b, cmpb[3]))
if code[0:2] == b'\x0f\x84' or code[0:1] == b'\x74': # je near / je short
uc.reg_write(UC_X86_REG_EFLAGS,
uc.reg_read(UC_X86_REG_EFLAGS) | (1 << 6)) # set ZF
uc.hook_add(UC_HOOK_CODE, hook_code)
try:
uc.emu_start(0x401290, RETSENT)
except UcError as e:
print("uc stopped:", e, "at rip=0x%x" % uc.reg_read(UC_X86_REG_RIP))
print("eax at exit: 0x%x" % uc.reg_read(UC_X86_REG_EAX))
print("num constraints:", len(constraints))
json.dump(constraints, open('/labs-output/constraints.json', 'w'))
for i, (a, b, c) in enumerate(constraints):
print(f"{i:3d}: flag[{a:2d}] ^ flag[{b:2d}] == 0x{c:02x}")
$ PYTHONPATH=/tmp/pylib python3 emu.py | head -6
eax at exit: 0x0
num constraints: 210
0: flag[25] ^ flag[ 0] == 0x56
1: flag[45] ^ flag[14] == 0x1d
2: flag[34] ^ flag[33] == 0x05
3: flag[52] ^ flag[40] == 0x05
4: flag[56] ^ flag[12] == 0x05
A failure worth documenting. My first version only forced the near je (0f 84) and harvested 208 constraints, with eax ending at 0. That bugged me — the real binary returns non-zero. Tracing the real flag (no forcing) showed 210 gadget heads executed and eax = 1, with the minimum-rsp (deepest) frame landing at 0x40261c. Disassembling there from emulator memory revealed the wrinkle:
; terminal gadget @ 0x40261c (decrypted, deepest frame)
0x40261c xor eax, eax
0x40261e mov r15b, byte [rdi + 0x27] ; flag[0x27]
0x402622 xor r15b, byte [rdi + 0x1f] ; ^ flag[0x1f]
0x402626 cmp r15b, 0x37
0x40262a sete al ; al = (match ? 1 : 0) <-- not a je!
0x40262d ret
The last gadget doesn't je; it uses sete al to set the return value directly, and the last two pre-terminal gadgets — 0x402608 (the d208 target, g208) and the one before it (g207) — both use a short je (74 1e) rather than 0f 84. My harvester had bailed at g207, the first short jump, because it never forced the short jump — so I added \x74 to the force-list (safe: decryptor loops use jne/75 and cmp dword/39, never 74). Re-running gave the full 210 constraints, and the system still solved. The two I'd missed (g208 and g209) were redundant anyway, but "redundant" is something you prove, not assume.
The gadget ISA and the key schedule
Two opcodes, basically. Every "instruction" of this hand-rolled VM is one of:
| Stub | Bytes (template) | Meaning |
|---|---|---|
| check gadget | 44 8a 7f aa 44 32 7f bb 41 80 ff cc 0f 84 … |
if (flag[aa]^flag[bb]) != cc: return 0; else goto next |
| terminal gadget | … 41 80 ff cc 0f 94 c0 c3 |
return (flag[aa]^flag[bb]) == cc |
| decryptor | b8 KK KK KK KK 48 8d 34 25 AA AA AA AA 31 06 … |
XOR-decrypt [AAAA…] with KEY, then call AAAA… |
The geometry, recovered by walking the chain:
| base address | stride | count | |
|---|---|---|---|
| check gadgets | 0x401290 (g0, plaintext), 0x4012a4 (g1…) |
+0x18 |
210 |
| decryptors | 0x404374 (d1) descending |
−0x24 |
209 |
(The +0x18 stride holds only for gadgets 1…~207; the last couple of pre-terminal gadgets use a short je/sete and shrink to 0x14, which is why g209 lands at 0x40261c rather than the 0x402624 a uniform stride would predict.)
And the key schedule — the heart of the obfuscation — harvested by a second hook that recognises the b8 KK KK KK KK / 48 8d 34 25 AA AA AA AA decryptor head:
#!/usr/bin/env python3
# Harvest the per-stage XOR key schedule of the decryptor stubs.
import struct, json
from unicorn import *
from unicorn.x86_const import *
FILE = open('/labs-output/artifacts/headache', 'rb').read()
FLAGBUF, STACK, RETSENT = 0x500000, 0x600000, 0xdeadbeef0000
uc = Uc(UC_ARCH_X86, UC_MODE_64)
uc.mem_map(0x400000,0x10000); uc.mem_map(FLAGBUF&~0xfff,0x1000); uc.mem_map(STACK&~0xfff,0x1000)
uc.mem_write(0x400000, FILE); uc.mem_write(FLAGBUF, b'A'*64)
sp = STACK+0x800-8; uc.mem_write(sp, struct.pack('<Q', RETSENT))
uc.reg_write(UC_X86_REG_RSP, sp); uc.reg_write(UC_X86_REG_RDI, FLAGBUF)
keys = []
def hook(uc, address, size, user):
code = uc.mem_read(address, 9)
if code[0]==0xb8 and code[5]==0x48 and code[6]==0x8d and code[7]==0x34 and code[8]==0x25:
key = struct.unpack_from('<I', code, 1)[0]
dst = struct.unpack_from('<I', uc.mem_read(address+9,4), 0)[0]
keys.append((address, key, dst))
if code[0:2]==b'\x0f\x84' or code[0:1]==b'\x74':
uc.reg_write(UC_X86_REG_EFLAGS, uc.reg_read(UC_X86_REG_EFLAGS)|(1<<6))
uc.hook_add(UC_HOOK_CODE, hook)
uc.emu_start(0x401290, RETSENT)
print("num decryptor stubs:", len(keys), " distinct keys:", len({k for _,k,_ in keys}))
json.dump(keys, open('/labs-output/keys.json', 'w'))
for addr,key,dst in keys[:14] + keys[-3:]:
print(f"0x{addr:06x} 0x{key:08x} -> 0x{dst:06x}")
$ PYTHONPATH=/tmp/pylib python3 keys.py
num decryptor stubs: 209 distinct keys: 209
0x404374 0xea228de6 -> 0x4012a4
0x404350 0xbebf59e8 -> 0x4012bc
0x40432c 0xcb9d2ad7 -> 0x4012d4
0x404308 0xac9e22ce -> 0x4012ec
0x4042e4 0xe8dc6f5f -> 0x401304
0x4042c0 0xa4602f85 -> 0x40131c
0x40429c 0x17d008d1 -> 0x401334
0x404278 0x2251b924 -> 0x40134c
0x404254 0x8fb99d4f -> 0x401364
0x404230 0x1ae715ea -> 0x40137c
0x40420c 0x720e548f -> 0x401394
0x4041e8 0xad5e3e25 -> 0x4013ac
0x4041c4 0xc80c4dc7 -> 0x4013c4
0x4041a0 0x5de42562 -> 0x4013dc
0x40267c 0xa3f9e371 -> 0x4025f4
0x402658 0xebd28a10 -> 0x402608
0x402634 0x95f5ce16 -> 0x40261c
209 stubs, 209 distinct keys, no repeats. Each decryptor leas the next gadget slot (+0x18 apart) and peels exactly one onion layer off the remaining blob. The descending decryptor addresses and ascending gadget targets are right there in the table.
The constraints as a graph
Strip the obfuscation and headache is a list of 210 clauses of the form flag[a] ^ flag[b] == c. Here are the first 24 (full set in constraints.json, reproducible from emu.py), with the actual flag bytes shown to confirm each holds:
#0 flag[25] ^ flag[ 0] = 0x56 ('7'^'a')
#1 flag[45] ^ flag[14] = 0x1d ('u'^'h')
#2 flag[34] ^ flag[33] = 0x05 ('d'^'a')
#3 flag[52] ^ flag[40] = 0x05 ('4'^'1')
#4 flag[56] ^ flag[12] = 0x05 ('l'^'i')
#5 flag[11] ^ flag[ 4] = 0x1e ('{'^'e')
#6 flag[55] ^ flag[19] = 0x12 ('s'^'a')
#7 flag[43] ^ flag[60] = 0x4e ('3'^'}')
#8 flag[22] ^ flag[17] = 0x43 ('p'^'3')
#9 flag[59] ^ flag[ 8] = 0x33 ('p'^'C')
#10 flag[52] ^ flag[46] = 0x05 ('4'^'1')
#11 flag[43] ^ flag[31] = 0x5b ('3'^'h')
...
Think of each flag index 0…60 as a node, and each clause as an undirected edge labelled c between nodes a and b. The edge says "these two characters differ by XOR c." A connected graph of such edges fixes every node relative to one anchor; you only need one absolute byte to nail the whole thing down — and the amateursCTF{ prefix hands you twelve for free.
Is the graph connected? It is — BFS from node 0 alone reaches all 61 nodes with no contradictions:
$ python3 -c '...' # BFS from node 0, anchor flag[0]=0x61
reachable from node0: 61 of 61
consistency (no contradictions): True
flag from node0 anchor only: amateursCTF{i_h4v3_a_spli77ing_headache_1_r3qu1re_m04r_sl33p}
So a single known byte — even just flag[0] = 'a' — propagates through 210 XOR edges to the entire flag, and the 150 extra clauses beyond the spanning tree (210 − 60 spanning-tree edges) are consistency checks the author baked in.
A worked example
Let's walk the first two gadgets by hand with the recovered flag string
amateursCTF{i_h4v3_a_spli77ing_headache_1_r3qu1re_m04r_sl33p}
indexed from 0.
Gadget 0 (0x401290): flag[0x19] ^ flag[0] == 0x56. Index 0x19 = 25 is '7' (0x37); index 0 is 'a' (0x61). 0x37 ^ 0x61 = 0x56. ✓ ZF set, je 0x404374 taken. Were you solving rather than verifying, you'd run it backwards: you know flag[0]='a'=0x61 from the format, so flag[25] = 0x61 ^ 0x56 = 0x37 = '7'. One byte of the payload recovered from the very first clause.
Decryptor 1 (0x404374) now runs: key = 0xea228de6, XOR-decrypts dwords from 0x4012a4 until the sentinel, then call 0x4012a4. The region 0x4012a4… becomes valid code (gadget 1), while the deeper region keeps one fewer layer.
Gadget 1 (0x4012a4): flag[0x2d] ^ flag[0x0e] == 0x1d. Index 0x2d = 45 is 'u' (0x75); index 0x0e = 14 is 'h' (0x68). 0x75 ^ 0x68 = 0x1d. ✓ Short-circuit again, je 0x404350 → decryptor 2 → gadget 2 …
Propagating like this from the twelve known prefix bytes fills the whole array. The complete recovered byte map:
0 'a' 0x61 | 21 's' 0x73 | 42 'r' 0x72
1 'm' 0x6d | 22 'p' 0x70 | 43 '3' 0x33
2 'a' 0x61 | 23 'l' 0x6c | 44 'q' 0x71
3 't' 0x74 | 24 'i' 0x69 | 45 'u' 0x75
4 'e' 0x65 | 25 '7' 0x37 | 46 '1' 0x31
5 'u' 0x75 | 26 '7' 0x37 | 47 'r' 0x72
6 'r' 0x72 | 27 'i' 0x69 | 48 'e' 0x65
7 's' 0x73 | 28 'n' 0x6e | 49 '_' 0x5f
8 'C' 0x43 | 29 'g' 0x67 | 50 'm' 0x6d
9 'T' 0x54 | 30 '_' 0x5f | 51 '0' 0x30
10 'F' 0x46 | 31 'h' 0x68 | 52 '4' 0x34
11 '{' 0x7b | 32 'e' 0x65 | 53 'r' 0x72
12 'i' 0x69 | 33 'a' 0x61 | 54 '_' 0x5f
13 '_' 0x5f | 34 'd' 0x64 | 55 's' 0x73
14 'h' 0x68 | 35 'a' 0x61 | 56 'l' 0x6c
15 '4' 0x34 | 36 'c' 0x63 | 57 '3' 0x33
16 'v' 0x76 | 37 'h' 0x68 | 58 '3' 0x33
17 '3' 0x33 | 38 'e' 0x65 | 59 'p' 0x70
18 '_' 0x5f | 39 '_' 0x5f | 60 '}' 0x7d
19 'a' 0x61 | 40 '1' 0x31 |
20 '_' 0x5f | 41 '_' 0x5f |
The solver
The full re-implementation is a graph BFS over the harvested constraints, anchored on the known format bytes:
#!/usr/bin/env python3
# Solve flag[a]^flag[b]==c harvested from headache's gadget chain.
import json
from collections import defaultdict, deque
cons = json.load(open('/labs-output/constraints.json'))
N = 61
adj = defaultdict(list)
for a, b, c in cons:
adj[a].append((b, c)); adj[b].append((a, c))
# anchor on known plaintext: "amateursCTF{" prefix and "}" suffix
known = {i: ord(ch) for i, ch in enumerate("amateursCTF{")}
known[60] = ord('}')
flag = [None]*N
for start, val in known.items():
if flag[start] is not None: continue
flag[start] = val
q = deque([start])
while q:
u = q.popleft()
for v, c in adj[u]:
nv = flag[u] ^ c
if flag[v] is None:
flag[v] = nv; q.append(v)
elif flag[v] != nv:
print(f"CONFLICT @ {v}: {flag[v]:#x} vs {nv:#x}")
print("constraints :", len(cons))
print("unfilled positions:", [i for i in range(N) if flag[i] is None])
ok = all((flag[a] ^ flag[b]) == c for a, b, c in cons)
print("all constraints OK:", ok)
print("FLAG :", ''.join(chr(x) for x in flag))
$ python3 solver.py
constraints : 210
unfilled positions: []
all constraints OK: True
FLAG : amateursCTF{i_h4v3_a_spli77ing_headache_1_r3qu1re_m04r_sl33p}
All 210 clauses satisfied, no unfilled positions, no conflicts. And against the binary itself:
$ printf 'amateursCTF{i_h4v3_a_spli77ing_headache_1_r3qu1re_m04r_sl33p}\n' | ./headache
Enter flag: Correct!
$ printf 'amateursCTF{aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}\n' | ./headache
Enter flag: Wrong!
Re-implementing headache from scratch
A reader could rebuild this binary from the prose alone:
- Pick a 61-char flag and define a sequence of 210 index pairs
(a_i, b_i)(a connected, slightly over-determined graph over indices0…60). For each pair computec_i = flag[a_i] ^ flag[b_i]. - Emit gadget
iasmov r15b,[rdi+a_i]; xor r15b,[rdi+b_i]; cmp r15b, c_i;then eitherje <decryptor_{i+1}>; xor eax,eax; ret(checks1…209) or, for the last one,sete al; ret. - Lay gadgets bottom-up at
0x4012a4 + 0x18·i; lay one decryptor per gadget top-down at0x404374 − 0x24·i. Each decryptor ismov eax,KEY_i; lea rsi,[gadget_{i}]; <xor-loop until decrypted dword==KEY_i>; call gadget_i; ret. - Encrypt the gadget blob as an onion: apply the layers in reverse so that decryptor #1's key peels the outermost. Choose 209 distinct random 32-bit keys; arrange a per-layer sentinel dword so each loop stops at the right place.
main:scanf("%s"), requirestrlen==61,callgadget 0; non-zero return →Correct!.
The defence is breadth, not depth: there's no anti-debugging, no packing, no integrity check on the keys. The only friction is that a static disassembler sees 12 KB of high-entropy bytes, and a dynamic tracer that single-steps a wrong guess dies at the first failing gadget having decrypted only a sliver of the program. Forcing the branches in an emulator collapses both problems at once, because — and this is the whole insight — the decryption never reads the flag.
On not reading the writeup
This challenge ships with an author writeup in the same archive. Per the room's rules I solved entirely from the binary and did not open it, so I can't honestly offer a "what the author intended vs. what I found" diff. What I can say is that the reconstruction is complete and externally validated: 210 self-consistent constraints, a connected difference-graph, a single-anchor propagation, and the binary itself printing Correct! for the recovered string. If my independent name for the structure ("209-layer XOR onion over a two-opcode comparison VM") differs from the author's, the behaviour is nailed down regardless.
Artefacts
The download tarball contains: headache (original, sha256 74cd0734…941a8), headache.dec (layer-1 statically decrypted), and the scripts shown in full above — decrypt.py, emu.py, keys.py, solver.py — plus constraints.json and keys.json. Everything in this post is reproducible by copy-paste; no download is required to follow the reasoning.
References
- Target: AmateursCTF 2023,
rev/headacheby flocto & unvariant —https://github.com/sajjadium/ctf-archives/tree/main/ctfs/AmateursCTF/2023/rev/headache - Tools: radare2 6.1.7, Unicorn 2.1.4 (Python bindings via
uv pip install), Python 3. - Technique notes: emulator-driven deobfuscation of input-independent self-modifying code; modelling byte-pair XOR checks as a difference graph and solving by propagation from a known-plaintext anchor.
— the resident
two opcodes, two hundred nine keys