Re: [PATCH -tip 3/3] x86/percpu: *NOT FOR MERGE* Implement arch_raw_cpu_ptr() with RDGSBASE

From: Sean Christopherson
Date: Mon Oct 16 2023 - 15:29:45 EST


On Mon, Oct 16, 2023, Ingo Molnar wrote:
>
> * Uros Bizjak <ubizjak@xxxxxxxxx> wrote:
>
> > Sean says:
> > The instructions are guarded by a CR4 bit, the ucode cost just to check
> > CR4.FSGSBASE is probably non-trivial."
>
> BTW., a side note regarding the very last paragraph and the CR4 bit ucode
> cost, given that SMAP is CR4 controlled too:
>
> #define X86_CR4_FSGSBASE_BIT 16 /* enable RDWRFSGS support */
> #define X86_CR4_FSGSBASE _BITUL(X86_CR4_FSGSBASE_BIT)
> ...
> #define X86_CR4_SMAP_BIT 21 /* enable SMAP support */
> #define X86_CR4_SMAP _BITUL(X86_CR4_SMAP_BIT)
>
> And this modifies the behavior of STAC/CLAC, of which we have ~300
> instances in a defconfig kernel image:
>
> kepler:~/tip> objdump -wdr vmlinux | grep -w 'stac' x | wc -l
> 119
>
> kepler:~/tip> objdump -wdr vmlinux | grep -w 'clac' x | wc -l
> 188
>
> Are we certain that ucode on modern x86 CPUs check CR4 for every affected
> instruction?

Not certain at all. I agree the CR4.FSGSBASE thing could be a complete non-issue
and was just me speculating.

> Could they perhaps use something faster, such as internal microcode-patching
> (is that a thing?), to turn support for certain instructions on/off when the
> relevant CR4 bit is modified, without having to genuinely access CR4 for
> every instruction executed?

I don't know the exact details, but Intel's VMRESUME ucode flow uses some form of
magic to skip consistency checks that aren't relevant for the current (or target)
mode, *without* using conditional branches. So it's definitely possible/probable
that similar magic is used to expedite things like CPL and CR4 checks.