Re: [PATCH RFC] KVM: x86: tell guests if the exposed SMT topology is trustworthy

From: Ankur Arora
Date: Fri Dec 06 2019 - 15:31:30 EST




On 12/6/19 5:46 AM, Vitaly Kuznetsov wrote:
Ankur Arora <ankur.a.arora@xxxxxxxxxx> writes:

On 2019-11-05 3:56 p.m., Paolo Bonzini wrote:
On 05/11/19 17:17, Vitaly Kuznetsov wrote:
There is also one additional piece of the information missing. A VM can be
sharing physical cores with other VMs (or other userspace tasks on the
host) so does KVM_FEATURE_TRUSTWORTHY_SMT imply that it's not the case or
not? It is unclear if this changes anything and can probably be left out
of scope (just don't do that).

Similar to the already existent 'NoNonArchitecturalCoreSharing' Hyper-V
enlightenment, the default value of KVM_HINTS_TRUSTWORTHY_SMT is set to
!cpu_smt_possible(). KVM userspace is thus supposed to pass it to guest's
CPUIDs in case it is '1' (meaning no SMT on the host at all) or do some
extra work (like CPU pinning and exposing the correct topology) before
passing '1' to the guest.

Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
---
Documentation/virt/kvm/cpuid.rst | 27 +++++++++++++++++++--------
arch/x86/include/uapi/asm/kvm_para.h | 2 ++
arch/x86/kvm/cpuid.c | 7 ++++++-
3 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index 01b081f6e7ea..64b94103fc90 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -86,6 +86,10 @@ KVM_FEATURE_PV_SCHED_YIELD 13 guest checks this feature bit
before using paravirtualized
sched yield.
+KVM_FEATURE_TRUSTWORTHY_SMT 14 set when host supports 'SMT
+ topology is trustworthy' hint
+ (KVM_HINTS_TRUSTWORTHY_SMT).
+

Instead of defining a one-off bit, can we make:

ecx = the set of known "hints" (defaults to edx if zero)

edx = the set of hints that apply to the virtual machine

Just to resurrect this thread, the guest could explicitly ACK
a KVM_FEATURE_DYNAMIC_HINT at init. This would allow the host
to change the hints whenever with the guest not needing to separately
ACK the changed hints.

(I apologize for dropping the ball on this, I'm intended to do RFCv2 in
a nearby future)

Regarding this particular hint (let's call it 'no nonarchitectural
coresharing' for now) I don't see much value in communicating change to
guest when it happens. Imagine our host for some reason is not able to
guarantee that anymore e.g. we've migrated to a host with less pCPUs
and/or special restrictions and have to start sharing. What we, as a
guest, are supposed to do when we receive a notification? "You're now
insecure, deal with it!" :-) Equally, I don't see much value in
pre-acking such change. "I'm fine with becoming insecure at some point".
True, for that use-case pre-ACK seems like exactly the thing you would
not want.
I do see some value in the guest receiving the notification though.
Maybe it could print a big fat printk or something :). Or, it could
change to a different security-policy-that-I-just-made-up.


If we, however, discuss other hints such 'pre-ACK' mechanism may make
sense, however, I'd make it an option to a 'challenge/response'
protocol: if host wants to change a hint it notifies the guest and waits
for an ACK from it (e.g. a pair of MSRs + an interrupt). I, however,
My main reason for this 'pre-ACK' approach is some discomfort with
changing the CPUID edx from under the guest.

The MSR+interrupt approach would work as well but then we have the
same set of hints spread across CPUID and the MSR. What do you think
is the right handling for a guest that refuses to ACK the MSR?

have no good candidate from the existing hints which would require guest
to ACK (e.g revoking PV EOI would probably do but why would we do that?)
As I said before, challenge/response protocol is needed if we'd like to
make TSC frequency change the way Hyper-V does it (required for updating
guest TSC pages in nested case) but this is less and less important with
the appearance of TSC scaling. I'm still not sure if this is an
over-engineering or not. We can wait for the first good candidate to
decide.
As we've discussed offlist, the particular hint I'm interested in is
KVM_HINT_REALTIME. That's not a particularly good candidate though
because there's no correctness problem if the host does switch it
off suddenly.


Ankur