KAISER memory layout (Re: [PATCH 06/23] x86, kaiser: introduce user-mapped percpu areas)

From: Andy Lutomirski
Date: Thu Nov 02 2017 - 05:42:16 EST


On Tue, Oct 31, 2017 at 3:31 PM, Dave Hansen
<dave.hansen@xxxxxxxxxxxxxxx> wrote:
>
> These patches are based on work from a team at Graz University of
> Technology posted here: https://github.com/IAIK/KAISER
>

I think we're far enough along here that it may be time to nail down
the memory layout for real. I propose the following:

The user tables will contain the following:

- The GDT array.
- The IDT.
- The vsyscall page. We can make this be _PAGE_USER.
- The TSS.
- The per-cpu entry stack. Let's make it one page with guard pages
on either side. This can replace rsp_scratch.
- cpu_current_top_of_stack. This could be in the same page as the TSS.
- The entry text.
- The percpu IST (aka "EXCEPTION") stacks.

That's it.

We can either try to move all of the above into the fixmap or we can
have the user tables be sparse a la Dave's current approach. If we do
it the latter way, I think we'll want to add a mechanism to have holes
in the percpu space to give the entry stack a guard page.

I would *much* prefer moving everything into the fixmap, but that's a
wee bit awkward because we can't address per-cpu data in the fixmap
using %gs, which makes the SYSCALL code awkward. But we could alias
the SYSCALL entry text itself per-cpu into the fixmap, which lets us
use %rip-relative addressing, which is quite nice.

So I guess my preference is to actually try the fixmap approach. We
give the TSS the same aliasing treatment we gave the GDT, and I can
try to make the entry trampoline work through the fixmap and thus not
need %gs-based addressing until CR3 gets updated. (This actually
saves several cycles of latency.)

What do you all think?

I'll deal with the LDT separately. It will either live in the
fixmap-like region or it will live at the top of the user address
space.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/entry_consolidation