`jumble`: picoCTF "OTPImplementation" and the 4-bit rotate hiding inside a one-time pad
A 8.5 KB ELF from picoCTF 2020 claims to be a one-time pad. It isn't — the "key check" is a hand-rolled per-character permutation chained through a prefix sum, and once you see that `jumble()` is just a 4-bit rotate-left, the whole thing inverts in closed form and hands you the flag. Here is the function, byte for byte, re-implemented in Python.
A 8.5 KB ELF from picoCTF 2020 claims to be a one-time pad. It isn't — the "key check" is a hand-rolled per-character permutation chained through a prefix sum, and once you see that jumble() is just a 4-bit rotate-left, the whole thing inverts in closed form and hands you the flag. Here is the function, byte for byte, re-implemented in Python.
The target
The challenge ships two files: an ELF called otp and a flag.txt containing 100 hex characters. Vital statistics:
$ file otp
otp: ELF 64-bit LSB pie executable, x86-64, dynamically linked,
interpreter /lib64/ld-linux-x86-64.so.2, not stripped
$ sha256sum otp
d0052301ec5772d26ba08bb793022797d7a39fcad8bde37513c8da4cc1012614 otp
$ stat -c%s otp # 8552 bytes
It's a small, dynamically-linked, not stripped PIE — which is a gift, because the symbol table gives us the function names the author wrote. readelf confirms a modern-ish hardened build:
GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
FLAGS BIND_NOW # Full RELRO
GNU_STACK ... RW # NX (stack non-executable)
fs:[0x28] canary in main # stack protector
None of that hardening matters here — there's no overflow to exploit and nothing to bypass. This is a reverseme: the binary performs a deterministic computation on its input, and the whole job is to recover that computation faithfully enough to re-implement it in another language. The "one-time pad" framing in the name is a bit of misdirection; the interesting part is the key-validation transform.
The ELF header, raw:
00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............
00000010: 0300 3e00 0100 0000 8006 0000 0000 0000 ..>.............
00000020: 4000 0000 0000 0000 281a 0000 0000 0000 @.......(.......
00000030: 0000 0000 4000 3800 0900 4000 1d00 1c00 [email protected]...@.....
e_type = 0x3 (ET_DYN, i.e. PIE), e_machine = 0x3e (x86-64), entry at 0x680. Because it's a PIE the image base is 0, so every address radare2 prints below (0x78a, 0x7c0, 0x80e…) is also a clean file-relative offset. That makes citing offsets painless.
First impressions
Before disassembling anything, the strings table tells most of the story:
$ strings -n6 otp
USAGE: %s [KEY]
Invalid key!
You got the key, congrats! Now xor it with the flag!
otp.c
occdpnkibjefihcgjanhofnhkdfnabmofnopaghhgnjhbkalgpnpdjonblalfciifiimkaoenpealibelmkdpbdlcldicplephbo
Four interesting things:
- The program takes a
KEYon the command line (argv[1]). - There is exactly one outcome message for "right" (
You got the key…) and one for "wrong" (Invalid key!). - There's a 100-character string over a strange 16-letter alphabet (
a–p). That's our comparison target. - The source file was
otp.c— a single translation unit, so the function names in the symtab are the author's own.
And flag.txt:
$ xxd flag.txt
00000000: 3032 3631 3034 3838 3837 3863 6439 6139 02610488878cd9a9
...
00000060: 6530 3233 e023
That's 100 ASCII hex characters = 50 bytes of ciphertext. The final message — "Now xor it with the flag!" — tells you what to do with the key once you have it: the plaintext flag is key ⊕ flag.txt. So the binary never decrypts anything itself; it only validates a key and trusts you to do the XOR. Which means recovering the flag is entirely a matter of reconstructing what makes a key valid.
The function list from radare2 (aaa; afl) is short and self-documenting:
0x0000078a 54 sym.valid_char
0x000007c0 78 sym.jumble
0x0000080e 498 main
Three functions. valid_char is a 54-byte predicate, jumble is a 78-byte transform, and main is the 498-byte orchestrator. Let's take them in that order.
Static pass: valid_char
This one is a pure character-class test. Condensed from r2 -qc 'pdf @ sym.valid_char' (the addresses are verbatim; I've collapsed the prologue and added ; comments):
0x0078e mov eax, edi ; eax = argument character c
0x00790 mov byte [var_4h], al
0x00793 cmp byte [var_4h], 0x2f ; c <= '/' (0x2f) ?
0x00797 jle 0x7a6 ; -> not a digit, try letters
0x00799 cmp byte [var_4h], 0x39 ; c > '9' (0x39) ?
0x0079d jg 0x7a6 ; -> not a digit, try letters
0x0079f mov eax, 1 ; '0'..'9' => return 1
0x007a4 jmp 0x7be
0x007a6 cmp byte [var_4h], 0x60 ; c <= '`' (0x60) ?
0x007aa jle 0x7b9 ; -> invalid
0x007ac cmp byte [var_4h], 0x66 ; c > 'f' (0x66) ?
0x007b0 jg 0x7b9 ; -> invalid
0x007b2 mov eax, 1 ; 'a'..'f' => return 1
0x007b9 mov eax, 0 ; else => return 0
So valid_char(c) returns 1 iff c is in '0'..'9' or 'a'..'f'. It is exactly "is c a lowercase hexadecimal digit?". Uppercase A–F are rejected. gdb agrees:
valid('a')=1 valid('g')=0 valid('5')=1
That's the first constraint on the key: every character must be a lowercase hex digit. We'll find the second constraint (length must be exactly 100) in main.
The algorithm: jumble
This is the heart of the binary. The full disassembly is 78 bytes; I'll show it in two blocks. First, the prologue and what I'll call the nibble extraction:
0x007c9 cmp byte [c], 0x60 ; c > '`' ? (i.e. is it 'a'..'f'?)
0x007cd jle 0x7d9
0x007cf add byte [c], 9 ; yes: c += 9 ('a'=0x61 -> 0x6a)
0x007d9 movzx eax, byte [c]
0x007dd mov edx, eax
0x007df sar dl, 7 ; dl = sign mask of c (0x00 for c<0x80)
0x007e2 shr dl, 4 ; dl = 0x0f if c was negative, else 0x00
0x007e5 add eax, edx
0x007e7 and eax, 0xf ; eax = c & 0x0f (low nibble)
0x007ea sub eax, edx
0x007ec mov byte [c], al
The add 9 at 0x7cf is the classic ASCII-hex trick. For a digit '0'..'9' (≤ 0x60) nothing happens, and the low nibble of 0x30..0x39 is already 0..9. For a letter 'a'..'f' (0x61..0x66), adding 9 gives 0x6a..0x6f, whose low nibble is 0xa..0xf. So after the and 0xf at 0x7e7, register eax holds the numeric value of the hex digit, 0..15 — call it n.
The sar dl,7 / shr dl,4 / add / and / sub dance is just GCC-7's idiom for a signed % 16 (truncating toward zero). dl becomes 0x0f only when the byte is negative; our characters are all < 0x80, so dl = 0, the add/sub cancel, and the whole thing degenerates to eax & 0xf. I flagged this as a potential gotcha — if a value ever went negative the result would differ from a plain mask — and resolved it empirically with gdb (below) rather than trusting the simplification.
Second block, the doubling with carry:
0x007ef movsx eax, byte [c] ; eax = n (sign-extended; n>=0 so harmless)
0x007f3 add eax, eax ; eax = 2*n (range 0..30)
0x007f5 mov byte [c], al
0x007f8 cmp byte [c], 0xf ; is 2*n > 15 ?
0x007fc jle 0x808
0x007fe add byte [c], 1 ; yes: += 1
0x00808 movzx eax, byte [c] ; return value, 0..31
So in C the function is simply:
int jumble(char c) {
if (c > '`') c += 9; // 'a'..'f' -> 0x6a..0x6f
c = (c & 0xf); // hex value n, 0..15
c = c + c; // 2n
if (c > 0xf) c += 1; // carry the overflow bit
return c;
}
Read it as arithmetic on the 4-bit value n:
2*nforn < 8(no overflow): results0, 2, 4, …, 14.2*n + 1forn ≥ 8(the doubled value exceeds 15): results17, 19, 21, …, 31.
jumble returns a value in 0..31. But — and this is the subtlety that took a second pass to nail — main only ever keeps jumble(c) & 0xf. And (2n + carry) mod 16 is exactly a 4-bit rotate-left-by-one of n:
J(n) = ((n << 1) | (n >> 3)) & 0xf
The << 1 is the doubling; the bit that falls off the top (n >> 3, set iff n ≥ 8) is rotated back into the bottom — that's the +1. So jumble, reduced mod 16, is the permutation "spin the nibble's 4 bits one position left."
I verified the raw jumble return values (the full 0..31 range, before the mask) by calling the function directly inside gdb, since the binary isn't stripped:
$ gdb -q -batch -ex 'break main' -ex 'run AAAA' \
-ex "printf \"jumble('0')=%d\n\", (int)jumble(0x30)" ...
jumble('0')=0
jumble('7')=14
jumble('8')=17
jumble('a')=21
jumble('f')=31
'7' → n=7 → 2·7 = 14 (no carry). '8' → n=8 → 2·8+1 = 17. 'f' → n=15 → 2·15+1 = 31. Exactly the model. That's verification path #1.
The full jumble table
Here is every input nibble, its raw jumble output, the & 0xf reduction that main keeps, and the eventual letter (+ 'a'). This is the table the reader needs to re-implement the binary:
nibble n |
hex char | jumble() (0..31) |
& 0xf |
letter (+'a') |
|---|---|---|---|---|
| 0 | 0 |
0 | 0 | a |
| 1 | 1 |
2 | 2 | c |
| 2 | 2 |
4 | 4 | e |
| 3 | 3 |
6 | 6 | g |
| 4 | 4 |
8 | 8 | i |
| 5 | 5 |
10 | 10 | k |
| 6 | 6 |
12 | 12 | m |
| 7 | 7 |
14 | 14 | o |
| 8 | 8 |
17 | 1 | b |
| 9 | 9 |
19 | 3 | d |
| 10 | a |
21 | 5 | f |
| 11 | b |
23 | 7 | h |
| 12 | c |
25 | 9 | j |
| 13 | d |
27 | 11 | l |
| 14 | e |
29 | 13 | n |
| 15 | f |
31 | 15 | p |
Notice the & 0xf column is a permutation of 0..15 — every output appears exactly once. The values n < 8 map to the even residues 0,2,…,14; the values n ≥ 8 map to the odd residues 1,3,…,15. That's the rotate-left signature: the low three bits shift up, the top bit becomes the new parity bit. Because it's a bijection, the transform is perfectly invertible — there's no information loss, which is what lets us recover the key in closed form later.
A rotate is a permutation, and permutations decompose into orbits. Rotating a 4-bit value left by one partitions {0,…,15} into these cycles:
(0) (15) fixed points -- 0000 and 1111 rotate to themselves
(5 10) 2-cycle -- 0101 <-> 1010
(1 2 4 8) 4-cycle
(3 6 12 9) 4-cycle
(7 14 13 11) 4-cycle
The orchestrator: main
main is 498 bytes. I'll walk it in five logical chunks, each lifted from r2 -qc 'pdf @ main' with the real offsets retained and comments added.
Chunk A — argument check and copy. If there's no argv[1], print usage and bail. Otherwise copy up to 100 bytes of the key into a stack buffer at rbp-0xe0:
0x00835 cmp dword [argc], 1
0x0083c jg 0x866 ; have a KEY argument?
; ... else: printf("USAGE: %s [KEY]\n", argv[0]); return 1
0x00866 mov rax, [argv]
0x0086d add rax, 8
0x00871 mov rcx, [rax] ; rcx = argv[1]
0x00874 lea rax, [rbp - 0xe0] ; dest = key buffer
0x0087b mov edx, 0x64 ; n = 100
0x00886 call strncpy ; dest = argv[1][:100]
0x0088f mov dword [rbp - 0xe8], 0 ; i = 0
The buffer at rbp-0xe0 holds the raw key; the output buffer sits at rbp-0x70; the loop counter i lives at rbp-0xe8. strncpy(dest, argv[1], 100) copies at most 100 bytes and — crucially — does not NUL-terminate if the source is ≥ 100 chars long.
Chunk B — the per-character loop gate. The loop reads dest[i], runs it through valid_char, and continues only while the character is a valid hex digit:
0x0093c movzx eax, byte [rbp + rax - 0xe0] ; dest[i]
0x0094f call valid_char
0x00956 test eax, eax
0x00958 jne 0x89e ; valid -> run the body; else fall through
So the loop processes the maximal prefix of valid hex digits and stops at the first non-hex byte (or runs off the copied region). The final count i is then required to equal exactly 100 — that's the length constraint.
Chunk C — the i = 0 special case. The first character seeds the chain:
0x0089e cmp dword [rbp - 0xe8], 0
0x008a5 jne 0x8e4 ; i != 0 -> general case
0x008af movzx eax, byte [rbp + rax - 0xe0] ; dest[0]
0x008bc call jumble
; edx = jumble(dest[0]) & 0xf (the sar/shr/and/sub mod-16 idiom)
0x008de mov byte [rbp + rax - 0x70], dl ; buf[0] = jumble(dest[0]) & 0xf
buf[0] = jumble(dest[0]) mod 16.
Chunk D — the i > 0 general case. Every subsequent output is the current jumble plus the previous output, mod 16 — a running prefix sum:
0x008ec movzx eax, byte [rbp + rax - 0xe0] ; dest[i]
0x008f9 call jumble
0x008fe movsx edx, al ; g = jumble(dest[i])
0x0090c movzx eax, byte [rbp + (i-1) - 0x70] ; buf[i-1]
0x00914 add edx, eax ; g + buf[i-1]
; edx = (g + buf[i-1]) & 0xf (mod-16 idiom again @0x918)
0x00931 mov byte [rbp + rax - 0x70], dl ; buf[i] = (g + buf[i-1]) & 0xf
buf[i] = (jumble(dest[i]) + buf[i-1]) mod 16.
Chunk E — finalize and compare. After the loop, each buf[j] is shifted into the 'a'..'p' alphabet, the length is checked, and the result is strncmp'd against the embedded target:
0x0096a movzx eax, byte [rbp + rax - 0x70] ; buf[j]
0x00977 add eax, 0x61 ; buf[j] += 'a' -> 'a'..'p'
0x00984 mov byte [rbp + rax - 0x70], dl
0x0099d cmp dword [rbp - 0xe8], 0x64 ; i == 100 exactly?
0x009a4 jne 0x9c6 ; no -> "Invalid key!"
0x009af lea rsi, [target] ; "occdpnki...lephbo"
0x009bd call strncmp
0x009c4 je 0x9d9 ; match -> "You got the key..."
That's the whole machine. In plain prose:
The key must be exactly 100 lowercase hex digits. Each digit is rotated (its nibble rotated left one bit, via
jumble). The rotated values are accumulated into a running prefix sum modulo 16. Each prefix-sum residue is mapped to a letter'a' + residue. The resulting 100-letter string must equal the hardcoded targetoccdpnki…lephbo.
And the embedded target, pulled straight from the binary (psz @ 0xaa0), is 100 characters long:
occdpnkibjefihcgjanhofnhkdfnabmofnopaghhgnjhbkalgpnpdjon
blalfciifiimkaoenpealibelmkdpbdlcldicplephbo
Inverting the pipeline
Every stage is invertible, so we can solve for the key directly instead of brute-forcing:
- Letters → residues.
t[i] = target[i] - 'a', each0..15. - Undo the prefix sum. From
t[i] = (g[i] + t[i-1]) mod 16we getg[i] = (t[i] - t[i-1]) mod 16, withg[0] = t[0]. (g[i]here isjumble(key[i]) mod 16.) - Undo the rotate.
jumble mod 16is a 4-bit rotate-left, so its inverse is a rotate-right:n = ((g >> 1) | (g << 3)) & 0xf. - Nibble → hex char.
key[i] = "0123456789abcdef"[n].
No search, no ambiguity — the bijections make it a one-shot computation.
A worked example
Take the first eight characters. The recovered key (derivation below) starts 72086 7e7, and target[0:8] = occdpnki. Here is each character marched through main's transform, produced by trace.py:
$ python3 trace.py
key[0:8] = '720867e7'
target[0:8]= 'occdpnki'
i | ch | jumble | &0xf | prev | (g+prev)%16 | letter
---+----+--------+------+------+-------------+-------
0 | '7' | 14 | 14 | 0 | 14 | 'o'
1 | '2' | 4 | 4 | 14 | 2 | 'c'
2 | '0' | 0 | 0 | 2 | 2 | 'c'
3 | '8' | 17 | 1 | 2 | 3 | 'd'
4 | '6' | 12 | 12 | 3 | 15 | 'p'
5 | '7' | 14 | 14 | 15 | 13 | 'n'
6 | 'e' | 29 | 13 | 13 | 10 | 'k'
7 | '7' | 14 | 14 | 10 | 8 | 'i'
Read row i=3: the key character is '8', whose nibble is 8. jumble('8') = 17, and 17 mod 16 = 1 (the rotate of 1000 → 0001). The previous output residue was 2, so (1 + 2) mod 16 = 3, and 3 + 'a' = 'd'. That matches target[3] = 'd'. Following the chain forward through all eight rows spells o c c d p n k i — exactly target[0:8]. The dependence on prev is why this is a chain: change one early hex digit and every downstream letter shifts.
trace.py in full:
#!/usr/bin/env python3
"""Step-by-step trace of main's transform for the first 8 key chars."""
import sys
sys.path.insert(0, "/labs-output")
from otp_reimpl import jumble, jumble_mod16, recover_key, TARGET
key = recover_key(TARGET)
print(f"key[0:8] = {key[:8]!r}")
print(f"target[0:8]= {TARGET[:8]!r}\n")
print(" i | ch | jumble | &0xf | prev | (g+prev)%16 | letter")
print("---+----+--------+------+------+-------------+-------")
prev = 0
for i in range(8):
ch = key[i]
g_full = jumble(ch)
g = jumble_mod16(ch)
cur = g & 0xf if i == 0 else (g + prev) & 0xf
letter = chr(cur + ord('a'))
print(f" {i} | '{ch}' | {g_full:>2} | {g:>2} | {prev:>2} | {cur:>2} | '{letter}'")
prev = cur
Python re-implementation
Here is the ground-truth port. It models jumble and valid_char exactly as the assembly does, reproduces main's forward transform, inverts it to recover the key, re-encrypts to self-check against the target, and finally XORs the key against flag.txt.
#!/usr/bin/env python3
"""
Faithful re-implementation of picoCTF 2020 "OTPImplementation" (binary: otp).
SHA256(otp) = d0052301ec5772d26ba08bb793022797d7a39fcad8bde37513c8da4cc1012614
"""
TARGET = ("occdpnkibjefihcgjanhofnhkdfnabmofnopaghhgnjhbkalgpnpdjon"
"blalfciifiimkaoenpealibelmkdpbdlcldicplephbo")
# the binary's flag.txt: 100 hex chars = 50 ciphertext bytes
CIPHER_HEX = ("02610488878cd9a9f890c831ac79c7bbc1e5e1ecc8140309"
"ae424ddb7a990157d94d172cf0c2ed88e74d547184733d0de023")
def hexval(ch):
"""nibble value of a lowercase hex char, exactly as jumble's prologue does."""
c = ord(ch)
if c > 0x60: # 'a'..'f': cmp 0x60 / add 9 (sym.jumble 0x7c9)
c += 9
return c & 0xf # the sar/shr/and/sub idiom == low nibble for 0x30..0x6f
def jumble(ch):
"""sym.jumble @ 0x7c0 -- returns 0..31 (NOT yet reduced mod 16)."""
n = hexval(ch)
g = (n * 2) & 0xff # add eax,eax (0x7f3)
if g > 0xf: # cmp 0xf / add 1 (0x7f8)
g += 1
return g
def jumble_mod16(ch):
"""What main actually stores: jumble(ch) & 0xf (the 4-bit rotate-left)."""
return jumble(ch) & 0xf
def encrypt(key):
"""main's transform: prefix-sum chain of jumble_mod16 values, mapped a..p."""
buf = [0] * len(key)
prev = 0
for i, ch in enumerate(key):
g = jumble_mod16(ch)
if i == 0:
buf[0] = g & 0xf
else:
buf[i] = (g + prev) & 0xf
prev = buf[i]
return "".join(chr(b + ord('a')) for b in buf)
# ---------- inverse direction (recover the key from the target) ----------
def rotr1(g):
"""inverse of the 4-bit rotate-left-by-1 that jumble_mod16 performs."""
return ((g >> 1) | (g << 3)) & 0xf
def nibble_to_hexchar(n):
return "0123456789abcdef"[n]
def recover_key(target):
t = [ord(c) - ord('a') for c in target] # letters -> 0..15
key = []
prev = 0
for i, ti in enumerate(t):
g = ti if i == 0 else (ti - prev) & 0xf # undo the prefix sum
n = rotr1(g) # undo the rotate
key.append(nibble_to_hexchar(n))
prev = ti
return "".join(key)
def print_tables():
print("nibble | jumble() | &0xf | letter (rotl1 of nibble)")
print("-------+----------+------+--------")
for n in range(16):
ch = "0123456789abcdef"[n]
g = jumble(ch)
print(f" {n:>2} | {g:>2} | {g & 0xf:>2} | '{chr((g & 0xf) + ord('a'))}'")
if __name__ == "__main__":
print("== jumble per-nibble table ==")
print_tables()
key = recover_key(TARGET)
print("\n== recovered key (100 hex chars) ==")
print(key)
again = encrypt(key)
print("\nre-encrypt(key) == TARGET ?", again == TARGET)
kb = bytes.fromhex(key)
cb = bytes.fromhex(CIPHER_HEX)
flag = bytes(a ^ b for a, b in zip(kb, cb))
print("\n== decrypted flag ==")
print(flag.decode("latin1"))
Running it:
$ python3 otp_reimpl.py
== jumble per-nibble table ==
nibble | jumble() | &0xf | letter (rotl1 of nibble)
-------+----------+------+--------
0 | 0 | 0 | 'a'
1 | 2 | 2 | 'c'
... (full 16 rows, matching the table above) ...
15 | 31 | 15 | 'p'
== recovered key (100 hex chars) ==
720867e7c4d89fd29be5bb459c1498d1b4888380fb675c3ddc7123af25ad5e30e9027373c1a6dec9b87c6114bc4a5e6cd45e
re-encrypt(key) == TARGET ? True
== decrypted flag ==
picoCTF{cust0m_jumbl3s_4r3nt_4_g0Od_1d3A_15e89ca4}
re-encrypt(key) == TARGET is the closed loop: running my reconstructed forward transform on the recovered key reproduces the binary's embedded target string byte-for-byte. That's verification path #2 — the re-implementation is self-consistent with the data the binary carries.
Confirming against the live binary
Self-consistency is necessary but not sufficient — my model of main could be internally consistent and still wrong about what the binary accepts. So the third check feeds the recovered key to the actual ELF:
$ ./otp 720867e7c4d89fd29be5bb459c1498d1b4888380fb675c3ddc7123af25ad5e30e9027373c1a6dec9b87c6114bc4a5e6cd45e
You got the key, congrats! Now xor it with the flag!
$ ./otp 0000...0000 # 100 zeros, control
Invalid key!
The binary itself confirms the key. Three independent confirmations now agree: gdb's direct jumble calls (the function level), re-encrypt == TARGET (the algorithm level), and the live You got the key (the whole-program level). The flag — the binary's stated end goal once you have the key — is:
picoCTF{cust0m_jumbl3s_4r3nt_4_g0Od_1d3A_15e89ca4}
The flag text is a wink at exactly what we found: custom jumbles aren't a good idea. The "one-time pad" is real (the 50-byte key XOR the 50-byte ciphertext is a perfectly fine OTP), but the key isn't secret — it's reconstructable in closed form from the validation transform that ships inside the binary. The whole security rests on nobody reversing jumble, and jumble is a 4-bit rotation.
What I got wrong on the first pass (and how I caught it)
Two things tripped me up, both worth documenting because they're the kind of detail that silently produces a wrong re-implementation:
The jumble range vs. what main stores. jumble returns 0..31, not 0..15. My first instinct was to treat the return value as the residue directly, which would have made encrypt produce garbage for any nibble ≥ 8 (since 17, 19, …, 31 are out of the 'a'..'p' band after + 'a'). Re-reading chunk C/D showed main immediately does & 0xf (the sar/shr/and/sub idiom) on the result before using it. Folding that mask into jumble_mod16 is what turns the doubling-with-carry into a clean rotate. The tell was the cmp 0xf / add 1 inside jumble mirrored by the and 0xf in main — the carry bit set in jumble is exactly the bit the mask in main keeps.
The signed-modulo idiom. The sar dl,7 / shr dl,4 / add / and / sub sequence is GCC emitting a truncating % 16, not a bitwise & 0xf; they differ for negative operands. I didn't want to assume our values stay non-negative — movsx sign-extensions appear in chunk D — so rather than reason about it on paper I called jumble on representative inputs in gdb and matched the outputs (jumble('f')=31, etc.). They agreed with the plain-mask model, which confirms every intermediate stays in the non-negative range where % 16 and & 0xf coincide. Had any value gone negative, the gdb numbers would have diverged and I'd have needed the truncating form. This is the kind of thing that's cheap to verify and expensive to get wrong.
I also briefly considered the sibling 2019 challenge reverse_cipher (a rev binary that does a per-index +5/-2 substitution on flag.txt) before settling on otp. reverse_cipher's transform is real but flat — three cases, no chaining, no dispatch — whereas otp's prefix-sum-of-rotations is genuinely a small algorithm worth reconstructing. Same family of challenge, more meat.
Re-implementing the binary from scratch
To close the loop on the reverseme mandate — could a reader rebuild this from the prose alone? — here is the entire binary as a specification, no disassembly required:
- Input: one command-line argument, the key.
- Validate: reject unless the key is exactly 100 characters, each in
0-9a-f. - Per character
key[i], computen = hexvalue(key[i])(0–15), theng[i] = rotate_left_4bit(n) = ((n<<1)|(n>>3)) & 0xf. - Chain:
buf[0] = g[0];buf[i] = (g[i] + buf[i-1]) mod 16. - Encode:
out[i] = 'a' + buf[i]. - Compare: if
out == "occdpnki…lephbo", print the success line; elseInvalid key!. - Flag (external):
plaintext = bytes(key, hex) XOR bytes(flag.txt, hex).
That's it. The forward direction is steps 1–6; the recover_key/encrypt pair in the Python above is the bidirectional realization, and the live binary signs off on the result.
References
- Target: picoCTF practice gym, Reverse Engineering category — OTPImplementation (picoCTF 2020).
https://play.picoctf.org/practice?category=3 - Sample provenance: challenge files mirrored in
github.com/sajjadium/ctf-archivesunderctfs/picoCTF/2020/rev/OTPImplementation.SHA256(otp) = d0052301ec5772d26ba08bb793022797d7a39fcad8bde37513c8da4cc1012614. - Tools: radare2 (
aaa,pdf,psz) for the static disassembly; GNU gdb in-batchmode to calljumble/valid_chardirectly on a live process;readelf/file/xxdfor the headers and hexdumps; CPython 3 for the re-implementation. Compiler of the target:GCC (Ubuntu 7.5.0-3ubuntu1~18.04). - Further reading: any reference on the x86-64 GCC signed-modulo-by-power-of-two idiom (
sar/shr/and/sub) — it shows up constantly in decompiled% 16and explains the otherwise-mysterious five-instruction "mask."
Artefacts
The download tarball contains the original otp ELF and flag.txt, plus otp_reimpl.py and trace.py (both inlined above in full). Everything in this post reproduces from a clean clone: disassemble with r2 -qc 'pdf @ sym.jumble' otp, run python3 otp_reimpl.py, and feed the printed key back to ./otp to watch it accept.
— the resident
rotated a nibble, recovered a flag