Re: [PATCH 1/1] KVM: pass through CPUID(0x80000006)

From: Sean Christopherson
Date: Wed Jul 12 2023 - 11:58:25 EST


Trimmed the Cc to remove folks that no longer directly work on any of this stuff.

On Fri, Jul 07, 2023, Takahiro Itazuri wrote:
> Please forgive me if this is an absurd question.
>
> Date: Tue, 14 Apr 2020 19:37:26 -0700
> From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> > Return the host's L2 cache and TLB information for CPUID.0x80000006
> > instead of zeroing out the entry as part of KVM_GET_SUPPORTED_CPUID.
> > This allows a userspace VMM to feed KVM_GET_SUPPORTED_CPUID's output
> > directly into KVM_SET_CPUID2 (without breaking the guest).

Ha, this confused me for a bit. While past me did technically write this changelog,
I was just massaging someone else's words.

I'm honestly a bit dubious of the claim that providing a zeroed out 0x80000006
would break the guest. I'm pretty sure I chose that phrase based Eric's original
wording that KVM's "defaults" would be "necessary".

Return L2 cache and TLB information to guests.
They could have been set before, but the defaults that KVM returns will be
necessary for usermode that doesn't supply their own CPUID tables.

I don't think it actually matters (see below), it's just a rather odd justification.

> I noticed that CPUID 0x80000005 also returns cache information (L1 Cache
> and TLB Information) when looking at AMD APM, while it is marked
> reserved on Intel SDM. What do you think about passing through CPUID
> 0x80000005 to guests?
>
> To be honest, I'm not sure if it is harmless from security and
> performance perspectives in the first place.
>
> Regard security aspect, I'm a bit concerned that it could help malicious
> guests to know something to allow cache side channel attacks. However,
> CPUID 0x80000006 has already passed through L2 Cache and TLB and L3
> Cache Information. If passing through CPUID 0x80000006 is really fine,
> I'm guessing it is the case with CPUID 0x80000005 as well.

It's definitely harmless from a security perspective. Userspace already has
access to this information as CPUID is NOT a priveleged instructions. And the
kernel also publishes this information in sysfs, e.g. /sys/devices/system/cpu/cpuN/cache,
and AFAIK that's not typically restricted.

KVM must assume that any and all information visible to userspace is also visible
to the guest, e.g. even if KVM wanted to police CPUID, nothing would prevent
userspace from providing a paravirtual interface to the guest to enumerate
cache and TLB topology.

> In terms of performance, as far as I know, some softwares utilizes cache
> information to achieve better performance. To simply put, by letting
> guests know cache information, they may gain some benefits. Having said
> that, if I understand correctly, guests can be scheduled on CPUs that do
> not belong to the same group of CPUs that they run last time, unless
> guests are pinned to a specific set of host physical CPUs. In such
> cases, guests may not benefit from using cache information.

I would be quite surprised if homogeneous, a.k.a. non-hybrid, CPUs ever have
variable cache/TLB properties across cores. Hybrid CPUs might be a different
story, but even then I gotta imagine that userspace software already has problems,
e.g. userspace processes will encounter variable cache/TLB behavior unless
userspace is affining all tasks.

Regardless, the decision on whether or not to report cache information via
KVM_GET_SUPPORTED_CPUID was made long, long ago, as KVM has enumerated CPUID.0x4
since basically forever. So really this only affects TLB info, and since KVM
already spits out 0x80000006, it's just L1 TLB info.

I'm mildly tempted to remove 0x80000006, for similar reasons as commit 45e966fcca03
("KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID"),
but I suspect that would do more harm than good, e.g. Linux falls back to
0x80000005 and 0x80000006 when running on AMD without extended topology info.

> If I'm missing something or say something wrong, I'd appreciate it if
> you could correct me. If it sounds no problem, I'd like to send a patch
> for it.

I think it makes sense to enumerate 0x80000005. Reporting 0x80000006 but not
0x80000005 seems to be the *worst* behavior, so as I see it, the decision is
really between adding 0x80000005 and removing 0x80000006. Adding 0x80000005
appears to be the least risky choice given that KVM has reported 0x80000006 for
over three years.