Re: [PATCHv2 04/13] x86/kvm: Do not try to disable kvmclock if it was not enabled

From: Sean Christopherson
Date: Mon Oct 23 2023 - 10:41:33 EST


On Mon, Oct 23, 2023, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@xxxxxxxxxx> writes:
>
> > On Fri, Oct 20, 2023, Vitaly Kuznetsov wrote:
> >> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> >> index b8ab9ee5896c..1ee49c98e70a 100644
> >> --- a/arch/x86/kernel/kvm.c
> >> +++ b/arch/x86/kernel/kvm.c
> >> @@ -454,7 +454,9 @@ static void kvm_guest_cpu_offline(bool shutdown)
> >> kvm_pv_disable_apf();
> >> if (!shutdown)
> >> apf_task_wake_all();
> >> - kvmclock_disable();
> >> + if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2) ||
> >> + kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE))
> >> + kvmclock_disable();
> >> }
> >
> > That would result in an unnecessray WRMSR in the case where kvmclock is disabled
> > on the command line. It _should_ be benign given how the code is written, but
> > it's not impossible to imagine a scenario where someone disabled kvmclock in the
> > guest because of a hypervisor bug. And the WRMSR would become a bogus write to
> > MSR 0x0 if someone made a "cleanup" to set msr_kvm_system_time if and only if
> > kvmclock is actually used, e.g. if someone made Kirill's change sans the check in
> > kvmclock_disable().
>
> True but we don't have such module params to disable other PV features so
> e.g. KVM_FEATURE_PV_EOI/KVM_FEATURE_MIGRATION_CONTROL are written to
> unconditionally. Wouldn't it be better to handle parameters like
> 'no-kvmclock' by clearing the feature bit in kvm_arch_para_features()'s
> return value so all kvm_para_has_feature() calls for it just return
> 'false'? We can even do an umbreall "no-kvm-features=<mask>" to cover
> all possible debug cases.

I don't know that it's worth the effort, or that it'd even be a net positive.

Today, kvm_para_has_feature() goes through to CPUID every time, e.g. we'd have
to add a small bit of infrastructure to snapshot and clear bits, or rework things
to let kvm_para_has_feature peek at kvmclock.

And things like KVM_FEATURE_PV_TLB_FLUSH would be quite weird, e.g. we either end
up leaving the feature bit set while returning "false" for pv_tlb_flush_supported(),
or we'd clear the feature bit for a rather large number of conditions that don't
really have anything to do with KVM_FEATURE_PV_TLB_FLUSH being available.

static bool pv_tlb_flush_supported(void)
{
return (kvm_para_has_feature(KVM_FEATURE_PV_TLB_FLUSH) &&
!kvm_para_has_hint(KVM_HINTS_REALTIME) &&
kvm_para_has_feature(KVM_FEATURE_STEAL_TIME) &&
!boot_cpu_has(X86_FEATURE_MWAIT) &&
(num_possible_cpus() != 1));
}