Re: [PATCH v5 7/8] KVM: VMX: Update PID-pointer table entry when APIC ID is changed

From: Sean Christopherson
Date: Wed Feb 02 2022 - 18:24:06 EST


On Thu, Jan 13, 2022, Sean Christopherson wrote:
> On Tue, Jan 11, 2022, Maxim Levitsky wrote:
> > Both Intel and AMD's PRM also state that changing APIC ID is implementation
> > dependent.
> >
> > I vote to forbid changing apic id, at least in the case any APIC acceleration
> > is used, be that APICv or AVIC.
>
> That has my vote as well. For IPIv in particular there's not much concern with
> backwards compability, i.e. we can tie the behavior to enable_ipiv.

Hrm, it may not be that simple. There's some crusty (really, really crusty) code
in Linux's boot code that writes APIC_ID. IIUC, the intent is to play nice with
running a UP crash dump kernel on "BSP" that isn't "the BSP", e.g. has a non-zero
APIC ID.

static void __init apic_bsp_up_setup(void)
{
#ifdef CONFIG_X86_64
apic_write(APIC_ID, apic->set_apic_id(boot_cpu_physical_apicid));
#else
/*
* Hack: In case of kdump, after a crash, kernel might be booting
* on a cpu with non-zero lapic id. But boot_cpu_physical_apicid
* might be zero if read from MP tables. Get it from LAPIC.
*/
# ifdef CONFIG_CRASH_DUMP
boot_cpu_physical_apicid = read_apic_id();
# endif
#endif
}

The most helpful comment is in generic_processor_info():

/*
* boot_cpu_physical_apicid is designed to have the apicid
* returned by read_apic_id(), i.e, the apicid of the
* currently booting-up processor. However, on some platforms,
* it is temporarily modified by the apicid reported as BSP
* through MP table. Concretely:
*
* - arch/x86/kernel/mpparse.c: MP_processor_info()
* - arch/x86/mm/amdtopology.c: amd_numa_init()
*
* This function is executed with the modified
* boot_cpu_physical_apicid. So, disabled_cpu_apicid kernel
* parameter doesn't work to disable APs on kdump 2nd kernel.
*
* Since fixing handling of boot_cpu_physical_apicid requires
* another discussion and tests on each platform, we leave it
* for now and here we use read_apic_id() directly in this
* function, generic_processor_info().
*/

It's entirely possible that this path is unused in a KVM guest, but I don't think
we can know that with 100% certainty.

But I also completely agree that attempting to keep the tables up-to-date is ugly
and a waste of time and effort, e.g. as Maxim pointed out, the current AVIC code
is comically broken.

Rather than disallowing the write, what if we add yet another inhibit that disables
APICv if IPI virtualization is enabled and a vCPU has an APIC ID != vcpu_id? KVM
is equipped to handle the emulation, so it just means that a guest that's doing
weird things loses a big of performance.