# CTF Reverse - Competition-Specific Patterns ## Table of Contents - [Hidden Emulator Opcodes + LD_PRELOAD Key Extraction (0xFun 2026)](#hidden-emulator-opcodes-ld_preload-key-extraction-0xfun-2026) - [Spectre-RSB SPN Cipher — Static Parameter Extraction (0xFun 2026)](#spectre-rsb-spn-cipher-static-parameter-extraction-0xfun-2026) - [Image XOR Mask Recovery via Smoothness (VuwCTF 2025)](#image-xor-mask-recovery-via-smoothness-vuwctf-2025) - [Shellcode in Data Section via mmap RWX (VuwCTF 2025)](#shellcode-in-data-section-via-mmap-rwx-vuwctf-2025) - [Recursive execve Subtraction (VuwCTF 2025)](#recursive-execve-subtraction-vuwctf-2025) - [Byte-at-a-Time Block Cipher Attack (UTCTF 2024)](#byte-at-a-time-block-cipher-attack-utctf-2024) - [Mathematical Convergence Bitmap (EHAX 2026)](#mathematical-convergence-bitmap-ehax-2026) - [Windows PE XOR Bitmap Extraction + OCR (srdnlenCTF 2026)](#windows-pe-xor-bitmap-extraction--ocr-srdnlenctf-2026) - [Two-Stage Loader: RC4 Gate + VM Constraints (srdnlenCTF 2026)](#two-stage-loader-rc4-gate--vm-constraints-srdnlenctf-2026) - [GBA ROM VM Hash Inversion via Meet-in-the-Middle (srdnlenCTF 2026)](#gba-rom-vm-hash-inversion-via-meet-in-the-middle-srdnlenctf-2026) - [Sprague-Grundy Game Theory Binary (DiceCTF 2026)](#sprague-grundy-game-theory-binary-dicectf-2026) - [Kernel Module Maze Solving (DiceCTF 2026)](#kernel-module-maze-solving-dicectf-2026) - [Multi-Threaded VM with Channel Synchronization (DiceCTF 2026)](#multi-threaded-vm-with-channel-synchronization-dicectf-2026) - [Multi-Layer Self-Decrypting Binary (DiceCTF 2026)](#multi-layer-self-decrypting-binary-dicectf-2026) - [Embedded ZIP + XOR License Decryption (MetaCTF 2026)](#embedded-zip--xor-license-decryption-metactf-2026) - [Stack String Deobfuscation from .rodata XOR Blob (Nullcon 2026)](#stack-string-deobfuscation-from-rodata-xor-blob-nullcon-2026) - [Prefix Hash Brute-Force (Nullcon 2026)](#prefix-hash-brute-force-nullcon-2026) - [CVP/LLL Lattice for Constrained Integer Validation (HTB ShadowLabyrinth)](#cvplll-lattice-for-constrained-integer-validation-htb-shadowlabyrinth) - [Decision Tree Function Obfuscation (HTB WonderSMS)](#decision-tree-function-obfuscation-htb-wondersms) - [GLSL Shader VM with Self-Modifying Code (ApoorvCTF 2026)](#glsl-shader-vm-with-self-modifying-code-apoorvctf-2026) - [GF(2^8) Gaussian Elimination for Flag Recovery (ApoorvCTF 2026)](#gf28-gaussian-elimination-for-flag-recovery-apoorvctf-2026) - [Z3 for Single-Line Python Boolean Circuit (BearCatCTF 2026)](#z3-for-single-line-python-boolean-circuit-bearcatctf-2026) - [Sliding Window Popcount Differential Propagation (BearCatCTF 2026)](#sliding-window-popcount-differential-propagation-bearcatctf-2026) --- ## Hidden Emulator Opcodes + LD_PRELOAD Key Extraction (0xFun 2026) **Pattern (CHIP-8):** Non-standard opcode `FxFF` triggers hidden `superChipRendrer()` → AES-256-CBC decryption. Key derived from binary constants. **Technique:** 1. Check all instruction dispatch branches for non-standard opcodes 2. Hidden opcode may trigger crypto functions (OpenSSL) 3. Use `LD_PRELOAD` hook on `EVP_DecryptInit_ex` to capture AES key at runtime: ```c #include int EVP_DecryptInit_ex(EVP_CIPHER_CTX *ctx, const EVP_CIPHER *type, ENGINE *impl, const unsigned char *key, const unsigned char *iv) { // Log key for (int i = 0; i < 32; i++) printf("%02x", key[i]); printf("\n"); // Call original return ((typeof(EVP_DecryptInit_ex)*)dlsym(RTLD_NEXT, "EVP_DecryptInit_ex")) (ctx, type, impl, key, iv); } ``` ```bash gcc -shared -fPIC -ldl -lssl hook.c -o hook.so LD_PRELOAD=./hook.so ./emulator rom.ch8 ``` --- ## Spectre-RSB SPN Cipher — Static Parameter Extraction (0xFun 2026) **Pattern:** Binary uses cache side channels to implement S-boxes, but ALL cipher parameters (round keys, S-box tables, permutation) are in the binary's data section. **Key insight:** Don't try to run on special hardware. Extract parameters statically: - 8 S-boxes × 8 output bits, 256 entries each - Values `0x340` = bit 1, `0x100` = bit 0 - 64-byte permutation table, 8 round keys ```python # Extract from binary data section import struct sbox = [[0]*256 for _ in range(8)] for i in range(8): for j in range(256): val = struct.unpack('> 33; h = (h * 0xff51afd7ed558ccd) & MASK64 h ^= h >> 33; h = (h * 0xc4ceb9fe1a85ec53) & MASK64 h ^= h >> 33 return h def hash_input(chars, seed_lo=0x84222325, seed_hi=0xcbf29ce4): hlo, hhi, ptr = seed_lo, seed_hi, 0 for c in chars: # tri_mix(c, mem[ptr]) — mem is always 0 delta = ((ord(c) * CUP) ^ (0 * P)) & MASK64 hlo = ((hlo ^ (delta & 0xFFFFFFFF)) * (P & 0xFFFFFFFF)) & 0xFFFFFFFF hhi = ((hhi ^ (delta >> 32)) * (P >> 32)) & 0xFFFFFFFF ptr = (ptr + 1) & 0xFF combined = ((hhi << 32) | (hlo ^ ptr)) & MASK64 return fmix64((combined * P) & MASK64) ``` **Meet-in-the-middle attack:** ```python import string TARGET = 0x73f3ebcbd9b4cd93 LENGTH = 6 SPLIT = 3 charset = [c for c in string.printable if 32 <= ord(c) < 127] # Forward pass: enumerate first 3 characters from seed state forward = {} for c1 in charset: for c2 in charset: for c3 in charset: state = hash_forward(seed, [c1, c2, c3]) forward[state] = c1 + c2 + c3 # Backward pass: invert fmix64 and final multiply, enumerate last 3 chars inv_target = invert_fmix64(TARGET) for c4 in charset: for c5 in charset: for c6 in charset: state = hash_backward(inv_target, [c4, c5, c6]) if state in forward: print(f"Found: {forward[state]}{c4}{c5}{c6}") ``` **Key insight:** Meet-in-the-middle reduces search from `95^6 ≈ 7.4×10^11` to `2×95^3 ≈ 1.7×10^6` — a factor of ~430,000x speedup. Critical when the hash function is invertible from the output side (i.e., `fmix64` and the final multiply can be undone). Also: uninitialized VM memory that stays zero simplifies the hash function by removing a variable. --- ## Sprague-Grundy Game Theory Binary (DiceCTF 2026) **Pattern (Bedtime):** Stripped Rust binary plays N rounds of bounded Nim. Each round has piles and max-move parameter k. Binary uses a PRNG for moves when in a losing position; user must respond optimally so the PRNG eventually generates an invalid move (returns 1). Sum of return values must equal a target. **Game theory identification:** - Bounded Nim: remove 1 to k items from any pile per turn - **Grundy value** per pile: `pile_value % (k+1)` - **XOR** of all Grundy values: non-zero = winning (N-position), zero = losing (P-position) - N-positions: computer wins automatically (returns 0) - P-positions: computer uses PRNG, may make invalid move (returns 1) **PRNG state tracking through user feedback:** ```python MASK64 = (1 << 64) - 1 def prng_step(state, pile_count, k): """Computer's PRNG move. Returns (pile_idx, amount, new_state).""" r12 = state[2] ^ 0x28027f28b04ccfa7 rax = (state[1] + r12) & MASK64 s0_new = ROL64((state[0] ** 2 + rax) & MASK64, 32) r12_upd = (r12 + rax) & MASK64 s0_final = ROL64((s0_new ** 2 + r12_upd) & MASK64, 32) pile_idx = rax % pile_count amount = (r12_upd % k) + 1 return pile_idx, amount, [s0_final, r12_upd, state[2]] # Critical: state[2] updated ONLY by user moves (XOR of pile_idx, amount, new_value) # PRNG moves do NOT affect state[2] — creates feedback loop ``` **Solving approach:** 1. Dump game data from GDB (all entries with pile values and parameters) 2. Classify: count P-positions (return 1) vs N-positions (return 0) 3. Simulate each P-position: PRNG moves → user responds optimally → track state[2] 4. Encode user moves as input format (4-digit decimal pairs, reversed order) **Key insight:** When a game binary's PRNG state depends on user input, you must simulate the full feedback loop — not just solve the game theory. Use GDB hardware watchpoints to discover which state variables are affected by user vs computer moves. --- ## Kernel Module Maze Solving (DiceCTF 2026) **Pattern (Explorer):** Rust kernel module implements a 3D maze via `/dev/challenge` ioctls. Navigate the maze, avoid decoy exits (status=2), find the real exit (status=1), read the flag. **Ioctl enumeration:** | Command | Description | |---------|-------------| | `0x80046481-83` | Get maze dimensions (3 axes, 8-16 each) | | `0x80046485` | Get status: 0=playing, 1=WIN, 2=decoy | | `0x80046486` | Get wall bitfield (6 directions) | | `0x80406487` | Get flag (64 bytes, only when status=1) | | `0x40046488` | Move in direction (0-5) | | `0x6489` | Reset position | **DFS solver with decoy avoidance:** ```c // Minimal static binary using raw syscalls (no libc) for small upload size // gcc -nostdlib -static -Os -fno-builtin -o solve solve.c -Wl,--gc-sections && strip solve int visited[16][16][16]; int bad[16][16][16]; // decoy positions across resets void dfs(int fd, int x, int y, int z) { if (visited[x][y][z] || bad[x][y][z]) return; visited[x][y][z] = 1; int status = ioctl_get_status(fd); if (status == 1) { read_flag(fd); exit(0); } if (status == 2) { bad[x][y][z] = 1; return; } // decoy — mark bad int walls = ioctl_get_walls(fd); int dx[] = {1,-1,0,0,0,0}, dy[] = {0,0,1,-1,0,0}, dz[] = {0,0,0,0,1,-1}; int opp[] = {2,3,0,1,5,4}; // opposite directions for backtracking for (int dir = 0; dir < 6; dir++) { if (!(walls & (1 << dir))) continue; // wall present ioctl_move(fd, dir); dfs(fd, x+dx[dir], y+dy[dir], z+dz[dir]); ioctl_move(fd, opp[dir]); // backtrack } } // After decoy hit: reset via ioctl 0x6489, clear visited, re-run DFS ``` **Remote deployment:** Upload binary via base64 chunks over netcat shell, decode, execute. **Key insight:** For kernel module challenges, injecting test binaries into initramfs and probing ioctls dynamically is faster than static RE of stripped kernel modules. Keep solver binary minimal (raw syscalls, no libc) for fast upload. --- ## Multi-Threaded VM with Channel Synchronization (DiceCTF 2026) **Pattern (locked-in):** Custom stack-based VM runs 16 concurrent threads verifying a 30-char flag. Threads communicate via futex-based channels. Pipeline: input → XOR scramble → transformation → base-4 state machine → final check. **Analysis approach:** 1. **Identify thread roles** by tracing channel read/write patterns in GDB 2. **Extract constants** (XOR scramble values, lookup tables) via breakpoints on specific opcodes 3. **Watch for inverted logic:** validity check returns 0 for valid, non-zero for blocked (opposite of intuition) 4. **Detect futex quirks:** `unlock_pi` on unowned mutex returns EPERM=1, which can change all computations **BFS state space search for constrained state machines:** ```python from collections import deque def solve_flag(scramble_vals, lookup_table, initial_state, target_state): """BFS through state machine to find valid flag bytes.""" flag = [None] * 30 # Known prefix/suffix from flag format flag[0:5] = list(b'dice{') flag[29] = ord('}') # For each unknown position, try all printable ASCII states = {initial_state} for pos in range(28, 4, -1): # processed in reverse next_states = {} for state in states: for ch in range(32, 127): transformed = transform(ch, scramble_vals[pos]) digits = to_base4(transformed) new_state = apply_digits(state, digits, lookup_table) if new_state is not None: # valid path exists next_states.setdefault(new_state, []).append((state, ch)) states = set(next_states.keys()) # Trace back from target_state to recover flag ``` **Key insight:** Multi-threaded VMs require tracing data flow across thread boundaries. Channel-based communication creates a pipeline — identify each thread's role (input, transform, validate, output) by watching which channels it reads/writes. Constants that affect computation may come from unexpected sources (futex return values, thread IDs). --- ## Multi-Layer Self-Decrypting Binary (DiceCTF 2026) **Pattern (another-onion):** Binary with N layers (e.g., 256), each reading 2 key bytes, deriving keystream via SHA-256 NI instructions, XOR-decrypting the next layer, then jumping to it. Must solve within a time limit (e.g., 30 minutes). **Oracle for correct key:** Wrong key bytes produce garbage code. Correct key bytes produce code with exactly 2 `call read@plt` instructions (next layer's reads). Brute-force all 65536 candidates per layer using this oracle. **JIT execution approach (fastest):** ```c // Map binary's memory at original virtual addresses into solver process // Compile solver at non-overlapping address: -Wl,-Ttext-segment=0x10000000 void *text = mmap((void*)0x400000, text_size, PROT_RWX, MAP_FIXED|MAP_PRIVATE, fd, 0); void *bss = mmap((void*)bss_addr, bss_size, PROT_RW, MAP_FIXED|MAP_SHARED, shm_fd, 0); // Patch read@plt to inject candidate bytes instead of reading stdin // Patch tail jmp/call to next layer with ret/NOP to return from layer // Fork-per-candidate: COW gives isolated memory without memcpy for (int candidate = 0; candidate < 65536; candidate++) { pid_t pid = fork(); if (pid == 0) { // Child: remap BSS as MAP_PRIVATE (COW from shared file) mmap(bss_addr, bss_size, PROT_RW, MAP_FIXED|MAP_PRIVATE, shm_fd, 0); inject_key(candidate >> 8, candidate & 0xff); ((void(*)())layer_addr)(); // Execute layer as function call // Check: does decrypted code contain exactly 2 call read@plt? if (count_read_calls(next_layer_addr) == 2) signal_found(candidate); _exit(0); } } ``` **Performance tiers:** | Approach | Speed | 256-layer estimate | |----------|-------|--------------------| | Python subprocess | ~2/s | days | | Ptrace fork injection | ~119/s | 6+ hours | | JIT + fork-per-candidate | ~1000/s | 140 min | | JIT + shared BSS + 32 workers | ~3500/s | **~17 min** | **Shared BSS optimization:** BSS (16MB+) stored in `/dev/shm` as `MAP_SHARED` in parent. Children remap as `MAP_PRIVATE` for COW. Reduces fork overhead from 16MB page-table setup to ~4KB. **Key insight:** Multi-layer decryption challenges are fundamentally about building fast brute-force engines. JIT execution (mapping binary memory into solver, running code directly as function calls) is orders of magnitude faster than ptrace. Fork-based COW provides free memory isolation per candidate. **Gotchas:** - Real binary may use `call` (0xe8) instead of `jmp` (0xe9) for layer transitions — adjust tail patching - BSS may extend beyond ELF MemSiz via kernel brk mapping — map extra space - SHA-NI instructions work even when not advertised in `/proc/cpuinfo` --- ## Embedded ZIP + XOR License Decryption (MetaCTF 2026) **Pattern (License To Rev):** Binary requires a license file as argument. Contains an embedded ZIP archive with the expected license, and an XOR-encrypted flag. **Recognition:** - `strings` reveals `EMBEDDED_ZIP` and `ENCRYPTED_MESSAGE` symbols - Binary is not stripped — `nm` or `readelf -s` shows data symbols in `.rodata` - `file` shows PIE executable, source file named `licensed.c` **Analysis workflow:** 1. **Find data symbols:** ```bash readelf -s binary | grep -E "EMBEDDED|ENCRYPTED|LICENSE" # EMBEDDED_ZIP at offset 0x2220, 384 bytes # ENCRYPTED_MESSAGE at offset 0x21e0, 35 bytes ``` 2. **Extract embedded ZIP:** ```python import struct with open('binary', 'rb') as f: data = f.read() # Find PK\x03\x04 magic in .rodata zip_start = data.find(b'PK\x03\x04') # Extract ZIP (size from symbol table or until next symbol) open('embedded.zip', 'wb').write(data[zip_start:zip_start+384]) ``` 3. **Extract license from ZIP:** ```bash unzip embedded.zip # Contains license.txt ``` 4. **XOR decrypt the flag:** ```python license = open('license.txt', 'rb').read() enc_msg = open('encrypted_msg.bin', 'rb').read() # Extract from .rodata flag = bytes(a ^ b for a, b in zip(enc_msg, license)) print(flag.decode()) ``` **Key insight:** No need to run the binary or bypass the expiry date check. The embedded ZIP and encrypted message are both in `.rodata` — extract and XOR offline. **Disassembly confirms:** - `memcmp(user_license, decompressed_embedded_zip, size)` — license validation - Date parsing with `sscanf("%d-%d-%d")` on `EXPIRY_DATE=` field - XOR loop: `ENCRYPTED_MESSAGE[i] ^ license[i]` → `putc()` per byte **Lesson:** When a binary has named symbols (`EMBEDDED_*`, `ENCRYPTED_*`), extract data directly from the binary without execution. XOR with known plaintext (the license) is trivially reversible. --- ## Stack String Deobfuscation from .rodata XOR Blob (Nullcon 2026) **Pattern (stack_strings_1/2):** Binary mmaps a blob from `.rodata`, XOR-deobfuscates it, then uses the blob to validate input. Flag is recovered by reimplementing the verification loop. **Recognition:** - `mmap()` call followed by XOR loop over `.rodata` data - Verification loop with running state (`eax`, `ebx`, `r9`) updated with constants like `0x9E3779B9`, `0x85EBCA6B`, `0xA97288ED` - `rol32()` operations with position-dependent shifts - Expected bytes stored in deobfuscated buffer **Approach:** 1. Extract `.rodata` blob with pyelftools: ```python from elftools.elf.elffile import ELFFile with open(binary, "rb") as f: elf = ELFFile(f) ro = elf.get_section_by_name(".rodata") blob = ro.data()[offset:offset+size] ``` 2. Recover embedded constants (length, magic values) by XOR with known keys from disassembly 3. Reimplement the byte-by-byte verification loop: - Each iteration: compute two hash-like values from running state - XOR them together and with expected byte to recover input byte - Update running state with constant additions **Variant (stack_strings_2):** Adds position permutation + state dependency on previous character: - Position permutation: byte `i` may go to position `pos[i]` in the output - State dependency: `need = (expected - rol8(prev_char, 1)) & 0xFF` - Must track `state` variable that updates to current character each iteration **Key constants to look for:** - `0x9E3779B9` (golden ratio fractional, common in hash functions) - `0x85EBCA6B` (MurmurHash3 finalizer constant) - `0xA97288ED` (related hash constant) - `rol32()` with shift `i & 7` --- ## Prefix Hash Brute-Force (Nullcon 2026) **Pattern (Hashinator):** Binary hashes every prefix of the input independently and outputs one digest per prefix. Given N output digests, the flag has N-1 characters. **Attack:** Recover input one character at a time: ```python for pos in range(1, len(target_hashes)): for ch in charset: candidate = known_prefix + ch + padding hashes = run_binary(candidate) if hashes[pos] == target_hashes[pos]: known_prefix += ch break ``` **Key insight:** If each prefix hash is independent (no chaining/HMAC), the problem decomposes into `N` x `|charset|` binary executions. This is the hash equivalent of byte-at-a-time block cipher attacks. **Detection:** Binary outputs multiple hash lines. Changing last character only changes last hash. Different input lengths produce different numbers of output lines. --- ## CVP/LLL Lattice for Constrained Integer Validation (HTB ShadowLabyrinth) **Pattern:** Binary validates flag via matrix multiplication where grouped input characters are multiplied by coefficient matrices and checked against expected 64-bit results. Standard algebra fails because solutions must be printable ASCII (32-126). Lattice-based CVP (Closest Vector Problem) with LLL reduction solves this efficiently. **Identification:** 1. Binary groups input characters (e.g., 4 at a time) 2. Each group is multiplied by a coefficient matrix 3. Results compared against hardcoded 64-bit values 4. Need integer solutions in a constrained range (printable ASCII) **SageMath CVP solver:** ```python from sage.all import * def solve_constrained_matrix(coefficients, targets, char_range=(32, 126)): """ coefficients: list of coefficient rows (e.g., 4 values per group) targets: expected output values char_range: valid character range (printable ASCII) """ n = len(coefficients[0]) # characters per group mid = (char_range[0] + char_range[1]) // 2 # Build lattice: [coeff_matrix | I*scale] # The target vector includes adjusted targets M = matrix(ZZ, n + len(targets), n + len(targets)) scale = 1000 # Weight to constrain character range for i, row in enumerate(coefficients): for j, c in enumerate(row): M[j, i] = c M[n + i, i] = 1 # padding for j in range(n): M[j, len(targets) + j] = scale target_vec = vector(ZZ, [t - sum(c * mid for c in row) for row, t in zip(coefficients, targets)] + [0] * n) # LLL + CVP L = M.LLL() closest = L * L.solve_left(target_vec) # or use Babai solution = [closest[len(targets) + j] // scale + mid for j in range(n)] return bytes(solution) ``` **Two-phase validation pattern:** 1. **Phase 1 (matrix math):** Solve via CVP/LLL → recovers first N characters 2. First N characters become AES key → decrypt `file.bin` (XOR last 16 bytes + AES-256-CBC + zlib decompress) 3. **Phase 2 (custom VM):** Decrypted bytecode runs in custom VM, validates remaining characters via another linear system (mod 2^32) **Modular linear system solving (Phase 2 — VM validation):** ```python import numpy as np from sympy import Matrix # M * x = v (mod 2^32) M_mod = Matrix(coefficients) % (2**32) v_mod = Matrix(targets) % (2**32) # Gaussian elimination in Z/(2^32) solution = M_mod.solve(v_mod) # Returns flag characters ``` **Key insight:** When a binary validates input through linear combinations with large coefficients and the solution must be in a small range (printable ASCII), this is a lattice problem in disguise. LLL reduction + CVP finds the nearest lattice point, recovering the constrained solution. Cross-reference: invoke `/ctf-crypto` for LLL/CVP fundamentals (advanced-math.md in ctf-crypto). **Detection:** Binary performs matrix-like operations on grouped input, compares against 64-bit constants, and a brute-force search space is too large (e.g., 256^4 per group × 12 groups). --- ## Decision Tree Function Obfuscation (HTB WonderSMS) **Pattern:** Binary routes input through ~200+ auto-generated functions, each computing a polynomial expression from input positions, comparing against a constant, and branching left/right. The tree makes static analysis impractical without scripted extraction. **Identification:** 1. Large number of similar functions with random-looking names (e.g., `f315732804`) 2. Each function computes arithmetic on specific input positions 3. Functions call other tree functions or a final validation function 4. Decompiled code shows `if (expr cmp constant) call_left() else call_right()` **Ghidra headless scripting for mass extraction:** ```python # Extract comparison constants from all tree functions # Run via: analyzeHeadless project/ tmp -import binary -postScript extract_tree.py from ghidra.program.model.listing import * from ghidra.program.model.symbol import * fm = currentProgram.getFunctionManager() results = [] for func in fm.getFunctions(True): name = func.getName() if name.startswith('f') and name[1:].isdigit(): # Find CMP instruction and extract immediate constant inst_iter = currentProgram.getListing().getInstructions(func.getBody(), True) for inst in inst_iter: if inst.getMnemonicString() == 'CMP': operand = inst.getOpObjects(1) if operand: results.append((name, int(operand[0].getValue()))) ``` **Constraint propagation from known output format:** 1. Start from known output bytes (e.g., `http://HTB{...}`) → fix several input positions 2. Fixed positions cascade through arithmetic constraints → determine dependent positions 3. Tree root equation pins down remaining free variables 4. Recognize English words in partial flag to disambiguate multiple solutions **Key insight:** Auto-generated decision trees look overwhelming but are repetitive by construction. Script the extraction (Ghidra, Binary Ninja, radare2) rather than reversing each function manually. The tree is just a dispatcher — the real logic is in the leaf function and its constraints. **Detection:** Binary with hundreds of similarly-structured functions, 3-5 input position references per function, branching to two other functions or a common leaf. --- ## GLSL Shader VM with Self-Modifying Code (ApoorvCTF 2026) **Pattern (Draw Me):** A WebGL2 fragment shader implements a Turing-complete VM on a 256x256 RGBA texture. The texture is both program memory and display output. **Texture layout:** - **Row 0:** Registers (pixel 0 = instruction pointer, pixels 1-32 = general purpose) - **Rows 1-127:** Program memory (RGBA = opcode, arg1, arg2, arg3) - **Rows 128-255:** VRAM (display output) **Opcodes:** NOP(0), SET(1), ADD(2), SUB(3), XOR(4), JMP(5), JNZ(6), VRAM-write(7), STORE(8), LOAD(9). 16 steps per frame. **Self-modifying code:** Phase 1 (decryption) uses STORE opcode to XOR-patch program memory that Phase 2 (drawing) then executes. The decryption overwrites SET instructions with correct pixel color values before the drawing code runs. **Why GPU rendering fails:** The GPU runs all pixels in parallel per frame, but the shader tracks only ONE write target per pixel per frame. With multiple VRAM writes per frame, only the last survives — losing 75%+ of pixels. Similarly, STORE patches conflict during parallel decryption. **Solve via sequential emulation:** ```python from PIL import Image import numpy as np img = Image.open('program.png').convert('RGBA') state = np.array(img, dtype=np.int32).copy() regs = [0] * 33 # Phase 1: Trace decryption — apply all STORE patches sequentially x, y = start_x, start_y while True: r, g, b, a = state[y][x] opcode = int(r) if opcode == 1: regs[g] = b & 255 # SET elif opcode == 4: regs[g] = regs[b] ^ regs[a] # XOR elif opcode == 8: # STORE — patches program memory tx, ty = regs[g], regs[b] state[ty][tx] = [regs[a], regs[a+1], regs[a+2], regs[a+3]] elif opcode == 5: break # JMP to drawing phase x += 1 if x > 255: x, y = 0, y + 1 # Phase 2: Execute drawing code — all VRAM writes preserved vram = np.zeros((128, 256), dtype=np.uint8) # ... trace with opcode 7 writing to vram[ty][tx] = color Image.fromarray(vram, mode='L').save('output.png') ``` **Key insight:** GLSL shaders are Turing-complete but GPU parallelism causes write conflicts. Self-modifying code (STORE patches) compounds the problem — patches from parallel executions overwrite each other. Sequential emulation in Python recovers the full output. The program.png file IS the bytecode. **Detection:** WebGL/shader challenge with a PNG "program" file, challenge says "nothing renders" or output is garbled. Look for custom opcode tables in GLSL source. --- ## GF(2^8) Gaussian Elimination for Flag Recovery (ApoorvCTF 2026) **Pattern (Forge):** Stripped binary performs Gaussian elimination over GF(2^8) (Galois Field with 256 elements, using the AES polynomial). A matrix and augmentation vector are embedded in `.rodata`. The solution vector is the flag. **GF(2^8) arithmetic with AES polynomial (x^8+x^4+x^3+x+1 = 0x11b):** ```python def gf_mul(a, b): """Multiply in GF(2^8) with AES reduction polynomial.""" p = 0 for _ in range(8): if b & 1: p ^= a hi = a & 0x80 a = (a << 1) & 0xff if hi: a ^= 0x1b # Reduction: x^8 = x^4+x^3+x+1 b >>= 1 return p def gf_inv(a): """Brute-force multiplicative inverse (fine for 256 elements).""" if a == 0: return 0 for x in range(1, 256): if gf_mul(a, x) == 1: return x return 0 ``` **Solving the linear system:** ```python # Extract N×N matrix + N-byte augmentation from binary .rodata N = 56 # Flag length # Build augmented matrix: N rows × (N+1) cols for col in range(N): # Find non-zero pivot pivot = next((r for r in range(col, N) if aug[r][col] != 0), -1) if pivot != col: aug[col], aug[pivot] = aug[pivot], aug[col] # Scale pivot row by inverse inv = gf_inv(aug[col][col]) aug[col] = [gf_mul(v, inv) for v in aug[col]] # Eliminate column in all other rows for row in range(N): if row == col: continue factor = aug[row][col] if factor == 0: continue aug[row] = [v ^ gf_mul(factor, aug[col][j]) for j, v in enumerate(aug[row])] flag = bytes(aug[i][N] for i in range(N)) ``` **Key insight:** GF(2^8) is NOT regular integer arithmetic — addition is XOR, multiplication uses polynomial reduction. The AES polynomial (0x11b) is the most common; look for the constant `0x1b` in disassembly. The binary may encrypt the result with AES-GCM afterward, but the raw solution vector (pre-encryption) is the flag. **Detection:** Binary with a large matrix in `.rodata` (N² bytes), XOR-based row operations, constants `0x1b` or `0x11b`, and flag length matching sqrt of matrix size. --- ## Z3 for Single-Line Python Boolean Circuit (BearCatCTF 2026) **Pattern (Captain Morgan):** Single-line Python (2000+ semicolons) validates flag via walrus operator chains decomposing input as a big-endian integer, with bitwise operations producing a boolean circuit. **Identification:** - Single-line Python with semicolons separating statements - Walrus operator `:=` chains: `(x := expr)` - Obfuscated XOR: `(x | i) & ~(x & i)` instead of `x ^ i` - Input treated as a single large integer, decomposed via bit-shifting **Z3 solution:** ```python from z3 import * n_bytes = 29 # Flag length ari = BitVec('ari', n_bytes * 8) # Parse semicolon-separated statements # Model walrus chains as LShR(ari, shift_amount) # Evaluate boolean expressions symbolically # Final assertion: result_var == 0 s = Solver() s.add(bfu == 0) # Final validation variable if s.check() == sat: m = s.model() val = m[ari].as_long() flag = val.to_bytes(n_bytes, 'big').decode('ascii') ``` **Key insight:** Single-line Python obfuscation creates a boolean circuit over input bits. The walrus operator chains are just variable assignments — split on semicolons and translate each to Z3 symbolically. Obfuscated XOR `(a | b) & ~(a & b)` is just `a ^ b`. Z3 solves these circuits in under a second. Look for `__builtins__` access or `ord()`/`chr()` calls to identify the input→integer conversion. **Detection:** Single-line Python with 1000+ semicolons, walrus operators, bitwise operations, and a final comparison to 0 or True. --- ## Sliding Window Popcount Differential Propagation (BearCatCTF 2026) **Pattern (Treasure Hunt 4):** Binary validates input via expected popcount (number of set bits) for each position of a 16-bit sliding window over the input bits. **Differential propagation:** When the window slides by 1 bit: ```text popcount(window[i+1]) - popcount(window[i]) = bit[i+16] - bit[i] ``` So: `bit[i+16] = bit[i] + (data[i+1] - data[i])` ```python expected = [...] # 337 expected popcount values total_bits = 337 + 15 # = 352 # Brute-force the initial 16-bit window (must have popcount = expected[0]) for start_val in range(0x10000): if bin(start_val).count('1') != expected[0]: continue bits = [0] * total_bits for j in range(16): bits[j] = (start_val >> (15 - j)) & 1 valid = True for i in range(len(expected) - 1): new_bit = bits[i] + (expected[i + 1] - expected[i]) if new_bit not in (0, 1): valid = False break bits[i + 16] = new_bit if valid: # Convert bits to bytes flag_bytes = bytes(int(''.join(map(str, bits[i:i+8])), 2) for i in range(0, total_bits, 8)) if b'BCCTF' in flag_bytes or flag_bytes[:5].isascii(): print(flag_bytes.decode(errors='replace')) break ``` **Key insight:** Sliding window popcount differences create a recurrence relation: each new bit is determined by the bit 16 positions back plus the popcount delta. Only the first 16 bits are free (constrained by initial popcount). Brute-force the ~4000-8000 valid initial windows — for each, the entire bit sequence is deterministic. Runs in under a second. **Detection:** Binary computing popcount/hamming weight on fixed-size windows. Expected value array with length ≈ input_bits - window_size + 1. Values in array are small integers (0 to window_size).