Re: [BUG?] mm/secretmem: memory address mapped to memfd_secret can be used in write syscall.

From: Theodore Ts'o
Date: Mon Nov 13 2023 - 08:26:45 EST


On Mon, Nov 13, 2023 at 10:15:05AM +0100, David Hildenbrand wrote:
>
> According to the man page:
>
> "The memory areas backing the file created with memfd_secret(2) are visible
> only to the processes that have access to the file descriptor. The memory
> region is removed from the kernel page tables and only the page tables of
> the processes holding the file descriptor map the corresponding physical
> memory. (Thus, the pages in the region can't be accessed by the kernel
> itself, so that, for example, pointers to the region can't be passed to
> system calls.)
>
> I'm not sure if the last part is actually true, if the syscalls end up
> walking user page tables to copy data in/out.

The idea behind removing it from the kernel page tables is so that
kernel code running in some other process context won't be able to
reference the memory via the kernel address space. (So if there is
some kind of kernel zero-day which allows arbitrary code execution,
the injected attack code would have to play games with page tables
before being able to reference the memory --- this is not
*impossible*, just more annoying.)

But if you are doing a buffered write, the copy from the user-supplied
buffer to the page cache is happening in the process's context. So
"foreground kernel code" can dereference the user-supplied pointer
just fine.

Cheers,

- Ted