Re: PKU usage improvements for threads

From: Stephen Röttger
Date: Thu Aug 25 2022 - 08:30:40 EST


On Wed, Aug 24, 2022 at 6:28 PM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> On 8/24/22 01:51, Stephen Röttger wrote:
> >>> Yeah, that's something for which our defenses are quite weak. But, it
> >>> also calls for a very generic mm/ solution and not something specific at
> >>> all to pkeys.
> > We were also thinking about if this should be a more generic feature instead of
> > being tied to pkeys. I.e. the doc above has an alternative proposal to introduce
> > something like a memory seal/unseal syscall.
> > I was personally leaning towards using pkeys for this for a few reasons:
> > * intuitively it would make sense to me to extend PKEY_DISABLE_ACCESS
> > to also mean disable all changes to the memory, not just the data.
>
> It would make some sense, but we can't do it with the existing
> PKEY_DISABLE_ACCESS ABI. It would surely break existing users if they
> couldn't munmap() memory that was PKEY_DISABLE_ACCESS.

Our thought was that this could be opt-in with a prctl().

> But, making it part of the mprotect() ABI wouldn't be the worst thing in
> the world. Since we have a pkey_mprotect(), any mprotect()-based
> mechanism could even reuse the existing pkey syscalls.
>
> I do agree with Andy, though, that I'm not quite sure what the attack
> model is here. If an attacker can make arbitrary system calls, surely
> protecting one little altstack VMA isn't doing to help much.

Note that we don't assume arbitrary syscalls. We only expect the
attacker to be able to control a subset of arguments to a subset of
syscalls.
We run with a seccomp filter that greatly limits available syscalls.
And for arguments, we will ensure in code that sensitive arguments
won't touch attacker-writable memory (e.g. the prot argument in
mprotect()).
But this is hard to do for things like munmap(addr), that's why we're
hoping that the kernel can help us out for that subset of syscalls.

> >> This kind of thing seems questionable to me. If the attacker controls syscall arguments, they can do almost anything. ISTM a CFI scheme should aim to prevent that bogus call in the first place, e.g. by preventing a problematic call.
>
>
> What I'm trying to say is: CFI, by itself, can protect syscalls by making sure that callers are safe. So, for example, if all munmap() callers do:
>
> if (addr is dangerous)
> abort();
> else
> munmap();
>
> Then, with CFI, an attacker can't get to the actual munmap without first doing the dangerous check. And you can implement this entirely in user code.
>
> With syscall user dispatch (this thing: https://lwn.net/Articles/828510/ -- sorry, I meant that when I typed interception), you even have a way to intercept *all* munmap() calls, for example.

Ah I see. Yeah, you're right. It should be possible to do these checks
in user code. Though, it would come with some challenges.
For once, the `addr is dangerous` check is not as easy as in kernel
space since AFAIK we can't check the pkey of a given address.
So we'd probably switch our stack and pkey state to traverse some data
structure to verify this. That, or keeping a dedicated range for these
protected mappings.
I'm also not sure if this would lead to noticeable perf regressions,
hopefully these memory management operations should not be in any hot
paths.
I'll investigate more if this would be a feasible alternative for us.

Side note: it seems like seccomp already allows us to do this instead
of syscall user dispatch or is there a feature in the latter that
would be useful?

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature