Redoing eXclusive Page Frame Ownership (XPFO) with isolated CPUs in mind (for KVM to isolate its guests per CPU)

From: Konrad Rzeszutek Wilk
Date: Mon Aug 20 2018 - 17:26:42 EST


Hi!

See eXclusive Page Frame Ownership (https://lwn.net/Articles/700606/) which was posted
way back in in 2016..

In the last couple of months there has been a slew of CPU issues that have complicated
a lot of things. The latest - L1TF - is still fresh in folks's mind and it is
especially acute to virtualization workloads.

As such a bunch of various folks from different cloud companies (CCed) are looking
at a way to make Linux kernel be more resistant to hardware having these sort of
bugs.

In particular we are looking at a way to "remove as many mappings from the global
kernel address space as possible. Specifically, while being in the
context of process A, memory of process B should not be visible in the
kernel." (email from Julian Stecklina). That is the high-level view and
how this could get done, well, that is why posting this on
LKML/linux-hardening/kvm-devel/linux-mm to start the discussion.

Usually I would start with a draft of RFC patches so folks can rip it apart, but
thanks to other people (Juerg thank you!) it already exists:

(see https://www.mail-archive.com/linux-kernel@xxxxxxxxxxxxxxx/msg1222756.html)

The idea would be to extend this to:

1) Only do it for processes that run under CPUS which are in isolcpus list.

2) Expand this to be a per-cpu page tables. That is each CPU has its own unique
set of pagetables - naturally _START_KERNEL -> __end would be mapped but the
rest would not.

Thoughts? Is this possible? Crazy? Better ideas?