Re: [PATCH v8 003/103] KVM: Refactor CPU compatibility check on module initialization

From: Huang, Kai
Date: Mon Aug 15 2022 - 22:43:22 EST


On Mon, 2022-08-15 at 22:35 +0000, Sean Christopherson wrote:
> On Fri, Aug 12, 2022, Huang, Kai wrote:
> > On Thu, 2022-08-11 at 17:39 +0000, Sean Christopherson wrote:
> > > I've been poking at the "hardware enable" code this week for other reasons, and
> > > have come to the conclusion that the current implementation is a mess.
> >
> > Thanks for the lengthy reply :)
> >
> > First of all, to clarify, I guess by "current implementation" you mean the
> > current upstream KVM code, but not this particular patch? :)
>
> Yeah, upstream code.
>
> > > Of course, that path is broken for other reasons too, e.g. needs to prevent CPUs
> > > from going on/off-line when KVM is enabling hardware.
> > > https://lore.kernel.org/all/20220216031528.92558-7-chao.gao@xxxxxxxxx
> >
> > If I read correctly, the problem described in above link seems only to be true
> > after we move CPUHP_AP_KVM_STARTING from STARTING section to ONLINE section, but
> > this hasn't been done yet in the current upstream KVM. Currently,
> > CPUHP_AP_KVM_STARTING is still in STARTING section so it is guaranteed it has
> > been executed before start_secondary sets itself to online cpu mask.
>
> The lurking issue is that for_each_online_cpu() can against hotplug, i.e. every
> instance of for_each_online_cpu() in KVM is buggy (at least on the x86 side, I
> can't tell at a glance whether or not arm pKVM's usage is safe).
>
> https://lore.kernel.org/all/87bl20aa72.ffs@tglx

Yes agreed. for_each_online_cpu() can race with CPU hotplug.

But the fact is looks there are many places using for_each_online_cpus() w/o
holding cpus_read_lock(). :)

>
> > Btw I saw v4 of Chao's patchset was sent Feb this year. It seems that series
> > indeed improved CPU compatibility check and hotplug handling. Any reason that
> > series wasn't merged?
>
> AFAIK it was just a lack of reviews/acks for the non-KVM patches.
>
> > Also agreed that kvm_lock should be used. But I am not sure whether
> > cpus_read_lock() is needed (whether CPU hotplug should be prevented). In
> > current KVM, we don't do CPU compatibility check for hotplug CPU anyway, so when
> > KVM does CPU compatibility check using for_each_online_cpu(), if CPU hotplug
> > (hot-removal) happens, the worst case is we lose compatibility check on that
> > CPU.
> >
> > Or perhaps I am missing something?
>
> On a hot-add of an incompatible CPU, KVM would potentially skip the compatibility
> check and try to enable hardware on an incompatible/broken CPU.

To resolve this, we need to do compatibility check before actually enabling
hardware on each cpu, as Chao's series did. I don't see using cpus_read_lock()
alone can actually fix anything.

>
> Another possible bug is the checking of hv_get_vp_assist_page(); hot-adding a
> CPU that failed to allocate the VP assist page while vmx_init() is checking online
> CPUs could result in a NULL pointer deref due to KVM not rejecting the CPU as it
> should.
>

So we need Chao's series to fix those problems: 1) Do compatibility check before
actually enable the hardware for each cpu; 2) allow CPU hotplug to fail; 3) Hold
cpus_read_lock() when needed.


--
Thanks,
-Kai