Re: [PATCH 07/32] mm: Bring back vmalloc_exec

From: Kees Cook
Date: Mon Jun 19 2023 - 15:45:52 EST


On Sat, Jun 17, 2023 at 11:34:31AM -0400, Kent Overstreet wrote:
> On Fri, Jun 16, 2023 at 09:13:22PM -0700, Andy Lutomirski wrote:
> > On 5/16/23 14:20, Kent Overstreet wrote:
> > > On Tue, May 16, 2023 at 02:02:11PM -0700, Kees Cook wrote:
> > > > For something that small, why not use the text_poke API?
> > >
> > > This looks like it's meant for patching existing kernel text, which
> > > isn't what I want - I'm generating new functions on the fly, one per
> > > btree node.
> >
> > Dynamically generating code is a giant can of worms.
> >
> > Kees touched on a basic security thing: a linear address mapped W+X is a big
> > no-no. And that's just scratching the surface -- ideally we would have a
> > strong protocol for generating code: the code is generated in some
> > extra-secure context, then it's made immutable and double-checked, then
> > it becomes live.
>
> "Double checking" arbitrary code is is fantasy. You can't "prove the
> security" of arbitrary code post compilation.

I think there's a misunderstanding here about the threat model I'm
interested in protecting against for JITs. While making sure the VM of a
JIT is safe in itself, that's separate from what I'm concerned about.

The threat model is about flaws _elsewhere_ in the kernel that can
leverage the JIT machinery to convert a "write anything anywhere anytime"
exploit primitive into an "execute anything" primitive. Arguments can
be made to say "a write anything flaw means the total collapse of the
security model so there's no point defending against it", but both that
type of flaw and the slippery slope argument don't stand up well to
real-world situations.

The kinds of flaws we've seen are frequently limited in scope (write
1 byte, write only NULs, write only in a specific range, etc), but
when chained together, the weakest link is what ultimately compromises
the kernel. As such, "W^X" is a basic building block of the kernel's
self-defense methods, because it is such a potent target for a
write->execute attack upgrades.

Since a JIT constructs something that will become executable, it needs
to defend itself against stray writes from other threads. Since Linux
doesn't (really) use per-CPU page tables, the workspace for a JIT can be
targeted by something that isn't the JIT. To deal with this, JITs need
to use 3 phases: a writing pass (into W memory), then switch it to RO
and perform a verification pass (construct it again, but compare results
to the RO version), and finally switch it executable. Or, it can use
writes to memory that only the local CPU can perform (i.e. text_poke(),
which uses a different set of page tables with different permissions).

Without basic W^X, it becomes extremely difficult to build further
defenses (e.g. protecting page tables themselves, etc) since WX will
remain the easiest target.

-Kees

--
Kees Cook