Re: [RFC PATCH] arch/x86: Optionally flush L1D on context switch

From: Herrenschmidt, Benjamin
Date: Sun Mar 22 2020 - 19:17:39 EST


On Sun, 2020-03-22 at 08:10 -0700, Andy Lutomirski wrote:
>
> Let me try to understand the issue. There is some high-value data,
> and that data is owned by a high-value process. At some point, the
> data ends up in L1D. Later in, evil code runs and may attempt to
> exfiltrate that data from L1D using a side channel. (The evil code
> is not necessarily in a malicious process context. It could be kernel
> code targeted by LVI or similar. It could be ordinary code that
> happens to contain a side channel gadget by accident.)

We aren't trying to protect processes against the kernel. I think
that's beyond what can reasonably be done if the kernel is
compromised... If you are worried about that case, use VMs.

We are mostly trying to protect process vs. process. either language
runtimes potentially running different "user" code, or containers
pertaining to different "users" etc....

> So the idea is to flush L1D after manipulating high-value data and
> before running evil code.
>
> The nasty part here is that we donât have a good handle on when L1D
> is filled and when the evil code runs. If the evil code is untrusted
> process userspace and the fill is an interrupt, then switch_mm is
> useless and we want to flush on kernel exit instead. If the fill and
> evil code are both userspace, then switch_mm is probably the right
> choice, but prepare_exit_from_usermode works too. If SMT is on, we
> lose no matter what. If the evil code is in kernel context, then
> itâs not clear what to do. If the fill and the evil code are both in
> kernel threads (hi, io_uring), then Iâm not at all sure what to do.
>
> In summary, kernel exit seems stronger, but the right answer isnât so
> clear.

Right. Which is why we are happy to limit the scope of this to
processes. I think if the kernel cannot be trusted in a given system,
the range of possible exploits dwarfs this one, I don't think it's what
we reasonably want to address here.

That said, I am not married to the switch_mm() solution, if there is
consensus that these things are better done in the kernel entry/exit
path, then so be it. But my gut feeling in that specific case is that
the overhead will be lower and the code potentially simpler in
switch_mm.

> We could do an optimized variant where we flush at kernel exit but we
> *decide* to flush in switch_mm.

Cheers,
Ben.