Re: [RFC PATCH] KVM: x86: inhibit APICv upon detecting direct APIC access from L2

From: Sean Christopherson
Date: Tue Aug 08 2023 - 19:48:24 EST


On Tue, Aug 08, 2023, Ake Koomsin wrote:
> On Mon, 07 Aug 2023 17:00:58 +0300
> Maxim Levitsky <mlevitsk@xxxxxxxxxx> wrote:
>
> > Is there a good reason why KVM doesn't expose APIC memslot to a
> > nested guest? While nested guest runs, the L1's APICv is "inhibited"
> > effectively anyway, so writes to this memslot should update APIC
> > registers and be picked up by APICv hardware when L1 resumes
> > execution.
> >
> > Since APICv alows itself to be inhibited due to other reasons, it
> > means that just like AVIC, it should be able to pick up arbitrary
> > changes to APIC registers which happened while it was inhibited, just
> > like AVIC does.
> >
> > I'll take a look at the code to see if APICv does this (I know AVIC's
> > code much better that APICv's)
> >
> > Is there a reproducer for this bug?
>
> The idea from step 6 to step 10 is to start BitVisor first, and start Linux on
> top of it. You can adjust the step as you like. Feel free to ask me anything
> regarding reproducing the problem with BitVisor if the giving steps are not
> sufficient.

Thank you for the detailed repro steps! However, it's likely going to be O(weeks)
before anyone is able to look at this in detail given the extensive repro steps.
If you have bandwidth, it's probably worth trying to reproduce the problem in a
KVM selftest (or a KVM-Unit-Test), e.g. create a nested VM, send an IPI from L2,
and see if it gets routed correctly. This purely a suggestion to try and get a
faster fix, it's by no means necessary.

Actually, typing that out raises a question (or two). What APICv VMCS control
settings does BitVisor use? E.g. is BitVisor enabling APICv for its VM (L2)?
If so, what values for the APIC access page and vAPIC page are shoved into
BitVisor's VMCS?

> The problem does not happen when enable_apicv=N. Note that SMP bringup with
> enable_apicv=N can fail. This is another problem. We don't have to worry about
> this for now. Linux seems to have no delay between INIT DEASSERT and SIPI during
> its SMP bringup. This can easily makes INIT and SIPI pending together resultling
> in signal lost.
>
> I admit that my knowledge on KVM and APICv is very limited. I may misunderstand
> the problem. If you don't mind, would it be possible for you to guide me which
> code path should I pay attention to? I would love to learn to find out the
> actual cause of the problem.

KVM *should* emulate the APIC MMIO access from L2. The call stack should reach
apic_mmio_write(), and assuming it's an ICR write, KVM should send an IPI.