Re: [PATCH v1 5/5] sbm: SandBox Mode documentation

From: Petr Tesařík
Date: Wed Feb 14 2024 - 10:02:27 EST


On Wed, 14 Feb 2024 15:01:25 +0100
Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:

> On Wed, Feb 14, 2024 at 05:30:53AM -0800, Andrew Morton wrote:
> > On Wed, 14 Feb 2024 12:30:35 +0100 Petr Tesarik <petrtesarik@xxxxxxxxxxxxxxx> wrote:
> >
> > > +Although data structures are not serialized and deserialized between kernel
> > > +mode and sandbox mode, all directly and indirectly referenced data structures
> > > +must be explicitly mapped into the sandbox, which requires some manual effort.
> >
> > Maybe I'm missing something here, but...
> >
> > The requirement that the sandboxed function only ever touch two linear
> > blocks of memory (yes?) seems a tremendous limitation. I mean, how can
> > the sandboxed function call kmalloc()? How can it call any useful
> > kernel functions? They'll all touch memory which lies outside the
> > sandbox areas?
> >
> > Perhaps a simple but real-world example would help clarify.
>
> I agree, this looks like an "interesting" framework, but we don't add
> code to the kernel without a real, in-kernel user for it.
>
> Without such a thing, we can't even consider it for inclusion as we
> don't know how it will actually work and how any subsystem would use it.
>
> Petr, do you have an user for this today?

Hi Greg & Andrew,

your observations is correct. In this form, the framework is quite
limited, and exactly this objections was expected. You have even
spotted one of the first enhancements I tested on top of this framework
(dynamic memory allocation).

The intended use case is code that processes untrusted data that is not
properly sanitized, but where performance is not critical. Some
examples include decompressing initramfs, loading a kernel module. Or
decoding a boot logo; I think I've noticed a vulnerability in another
project recently... ;-)

Of course, even decompression needs dynamic memory. My plan is to
extend the mechanism. Right now I'm mapping all of kernel text into the
sandbox. Later, I'd like to decompose the text section too. The pages
which contain sandboxed code should be mapped, but rest of the kernel
should not. If the sandbox tries to call kmalloc(), vmalloc(), or
schedule(), the attempt will generate a page fault. Sandbox page faults
are already intercepted, so handle_sbm_call() can decide if the call
should be allowed or not. If the sandbox policy says ALLOW, the page
fault handler will perform the API call on behalf of the sandboxed code
and return results, possibly with some post-call action, e.g. map some
more pages to the address space.

The fact that all communication with the rest of the kernel happens
through CPU exceptions is the reason this mechanism is unsuitable for
performance-critical applications.

OK, so why didn't I send the whole thing?

Decomposition of the kernel requires many more changes, e.g. in linker
scripts. Some of them depend on this patch series. Before I go and
clean up my code into something that can be submitted, I want to get
feedback from guys like you, to know if the whole idea would be even
considered, aka "Fail Fast".

Petr T