Re: [PATCH 0/6] Memory Mapping (VMA) protection using PKU - set 1

From: Stephen Röttger
Date: Wed May 17 2023 - 06:52:06 EST


On Wed, May 17, 2023 at 12:41 AM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> On 5/16/23 00:06, Stephen Röttger wrote:
> > On Mon, May 15, 2023 at 4:28 PM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
> >>
> >> On 5/15/23 06:05, jeffxu@xxxxxxxxxxxx wrote:
> >>> We're using PKU for in-process isolation to enforce control-flow integrity
> >>> for a JIT compiler. In our threat model, an attacker exploits a
> >>> vulnerability and has arbitrary read/write access to the whole process
> >>> space concurrently to other threads being executed. This attacker can
> >>> manipulate some arguments to syscalls from some threads.
> >>
> >> This all sounds like it hinges on the contents of PKRU in the attacker
> >> thread.
> >>
> >> Could you talk a bit about how the attacker is prevented from running
> >> WRPKRU, XRSTOR or compelling the kernel to write to PKRU like at sigreturn?
> >
> > (resending without html)
> >
> > Since we're using the feature for control-flow integrity, we assume
> > the control-flow is still intact at this point. I.e. the attacker
> > thread can't run arbitrary instructions.
>
> Can't run arbitrary instructions, but can make (pretty) arbitrary syscalls?

The threat model is that the attacker has arbitrary read/write, while other
threads run in parallel. So whenever a regular thread performs a syscall and
takes a syscall argument from memory, we assume that argument can be attacker
controlled.
Unfortunately, the line is a bit blurry which syscalls / syscall arguments we
need to assume to be attacker controlled. We're trying to approach this by
roughly categorizing syscalls+args:
* how commonly used is the syscall
* do we expect the argument to be taken from writable memory
* can we restrict the syscall+args with seccomp
* how difficult is it to restrict the syscall in userspace vs kernel
* does the syscall affect our protections (e.g. change control-flow or pkey)

Using munmap as an example:
* it's a very common syscall (nearly every seccomp filter will allow munmap)
* the addr argument will come from memory
* unmapping pkey-tagged pages breaks our assumptions
* it's hard to restrict in userspace since we'd need to keep track of all
address ranges that are unsafe to unmap and hook the syscall to perform the
validation on every call in the codebase.
* it's easy to validate in kernel with this patch

For most other syscalls, they either don't affect the control-flow, are easy to
avoid and block with seccomp or we can add validation in userspace (e.g. only
install signal handlers at program startup).

> > * For JIT code, we're going to scan it for wrpkru instructions before
> > writing it to executable memory
>
> ... and XRSTOR, right?

Right. We’ll just have a list of allowed instructions that the JIT compiler can
emit.

>
> > * For regular code, we only use wrpkru around short critical sections
> > to temporarily enable write access
> >
> > Sigreturn is a separate problem that we hope to solve by adding pkey
> > support to sigaltstack
>
> What kind of support were you planning to add?

We’d like to allow registering pkey-tagged memory as a sigaltstack. This would
allow the signal handler to run isolated from other threads. Right now, the
main reason this doesn’t work is that the kernel would need to change the pkru
state before storing the register state on the stack.

> I was thinking that an attacker with arbitrary write access would wait
> until PKRU was on the userspace stack and *JUST* before the kernel
> sigreturn code restores it to write a malicious value. It could
> presumably do this with some asynchronous mechanism so that even if
> there was only one attacker thread, it could change its own value.

I’m not sure I follow the details, can you give an example of an asynchronous
mechanism to do this? E.g. would this be the kernel writing to the memory in a
syscall for example?

> Also, the kernel side respect for PKRU is ... well ... rather weak.
> It's a best effort and if we *happen* to be in a kernel context where
> PKRU is relevant, we can try to respect PKRU. But there are a whole
> bunch of things like get_user_pages_remote() that just plain don't have
> PKRU available and can't respect it at all.
>
> I think io_uring also greatly expanded how common "remote" access to
> process memory is.
>
> So, overall, I'm thrilled to see another potential user for pkeys. It
> sounds like there's an actual user lined up here, which would be
> wonderful. But, I also want to make sure we don't go to the trouble to
> build something that doesn't actually present meaningful, durable
> obstacles to an attacker.
>
> I also haven't more than glanced at the code.

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature