Re: [PATCH v1 0/8] x86_64 SandBox Mode arch hooks

From: Petr Tesařík
Date: Thu Feb 15 2024 - 04:31:54 EST


On Thu, 15 Feb 2024 00:16:13 -0800
"H. Peter Anvin" <hpa@xxxxxxxxx> wrote:

> On February 14, 2024 10:59:32 PM PST, "Petr Tesařík" <petr@xxxxxxxxxxx> wrote:
> >On Wed, 14 Feb 2024 10:52:47 -0800
> >Xin Li <xin@xxxxxxxxx> wrote:
> >
> >> On 2/14/2024 10:22 AM, Petr Tesařík wrote:
> >> > On Wed, 14 Feb 2024 06:52:53 -0800
> >> > Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
> >> >
> >> >> On 2/14/24 03:35, Petr Tesarik wrote:
> >> >>> This patch series implements x86_64 arch hooks for the generic SandBox
> >> >>> Mode infrastructure.
> >> >>
> >> >> I think I'm missing a bit of context here. What does one _do_ with
> >> >> SandBox Mode? Why is it useful?
> >> >
> >> > I see, I split the patch series into the base infrastructure and the
> >> > x86_64 implementation, but I forgot to merge the two recipient lists.
> >> > :-(
> >> >
> >> > Anyway, in the long term I would like to work on gradual decomposition
> >> > of the kernel into a core part and many self-contained components.
> >> > Sandbox mode is a useful tool to enforce isolation.
> >> >
> >> > In its current form, sandbox mode is too limited for that, but I'm
> >> > trying to find some balance between "publish early" and reaching a
> >> > feature level where some concrete examples can be shown. I'd rather
> >> > fail fast than maintain hundreds of patches in an out-of-tree branch
> >> > before submitting (and failing anyway).
> >> >
> >> > Petr T
> >> >
> >>
> >> What you're proposing sounds a gigantic thing, which could potentially
> >> impact all subsystems.
> >
> >True. Luckily, sandbox mode allows me to move gradually, one component
> >at a time.
> >
> >> Unless you prove it has big advantages with real
> >> world usages, I guess nobody even wants to look into the patches.
> >>
> >> BTW, this seems another attempt to get the idea of micro-kernel into
> >> Linux.
> >
> >We know it's not feasible to convert Linux to a micro-kernel. AFAICS
> >that would require some kind of big switch, affecting all subsystems at
> >once.
> >
> >But with a growing code base and more or less constant bug-per-LOC rate,
> >people will continue to come up with some ideas how to limit the
> >potential impact of each bug. Logically, one of the concepts that come
> >to mind is decomposition.
> >
> >If my attempt helps to clarify how such decomposition should be done to
> >be acceptable, it is worthwile. If nothing else, I can summarize the
> >situation and ask Jonathan if he would kindly accept it as a LWN
> >article...
> >
> >Petr T
> >
>
> I have been thinking more about this, and I'm more than ever convinced that exposing kernel memory to *any* kind of user space is a really, really bad idea. It is not a door we ever want to open; once that line gets muddled, the attack surface opens up dramatically.

Would you mind elaborating on this a bit more?

For one thing, sandbox mode is *not* user mode. Sure, my proposed
x86-64 implementation runs with the same CPU privilege level as user
mode, but it is isolated from user mode with just as strong mechanisms
as any two user mode processes are isolated from each other. Are you
saying that process isolation in Linux is not all that strong after all?

Don't get me wrong. I'm honestly trying to understand what exactly
makes the idea so bad. I have apparently not considered something that
you have, and I would be glad if you could reveal it.

> And, in fact, we already have a sandbox mode in the kernel – it is called eBPF.

Sure. The difference is that eBPF is a platform of its own (with its
own consistency model, machine code etc.). Rewriting code for eBPF may
need a bit more effort.

Besides, Roberto wrote a PGP key parser as an eBPF program at some
point, and I believe it was rejected for that reason. So, it seems
there are situations where eBPF is not an alternative.

Roberto, can you remember and share some details?

Petr T