Re: [tip:x86/urgent] x86/cpu: Deal with broken firmware (VMWare/XEN)

From: Alok Kataria
Date: Fri Nov 11 2016 - 00:50:30 EST


Hi Thomas,

On Wed, 2016-11-09 at 12:27 -0800, tip-bot for Thomas Gleixner wrote:
> Commit-ID: d49597fd3bc7d9534de55e9256767f073be1b33a
> Gitweb: https://urldefense.proofpoint.com/v2/url?u=http-3A__git.kernel.org_tip_d49597fd3bc7d9534de55e9256767f073be1b33a&d=CwIDaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=2AkLWShm6V8Nuu8ZZ-80Flo6y0XxCGmO1xrsAeRArAE&m=WBsB4JFr-Dct0um4Kf8QAxC7w6p-Mlk3H-LwItQJ7Fw&s=qI64vSH3y6q8wJhcqpI4dXYma-i1RTtlxgKwKwhFWWo&e=
> Author: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> AuthorDate: Wed, 9 Nov 2016 16:35:51 +0100
> Committer: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> CommitDate: Wed, 9 Nov 2016 21:05:01 +0100
>
> x86/cpu: Deal with broken firmware (VMWare/XEN)
>
> Both ACPI and MP specifications require that the APIC id in the respective
> tables must be the same as the APIC id in CPUID.
>
> The kernel retrieves the physical package id from the APIC id during the
> ACPI/MP table scan and builds the physical to logical package map. The
> physical package id which is used after a CPU comes up is retrieved from
> CPUID. So we rely on ACPI/MP tables and CPUID agreeing in that respect.
>
> There exist VMware and XEN implementations which violate the spec. As a
> result the physical to logical package map, which relies on the ACPI/MP
> tables does not work on those systems, because the CPUID initialized
> physical package id does not match the firmware id. This causes system
> crashes and malfunction due to invalid package mappings.

For documentation purpose let me note that, VMware VMs running at
virtual hardware version 9 and above don't have this ACPI/MP and CPUID
divergence on the package id. So not everyone will see this issue on
their VMs, this bug is limited to folks running at virtual hardware
version 8 and prior.

It's good that we can workaround the platform bug for those VMs, thanks
for adding these checks.

Alok

>
> The only way to cure this is to sanitize the physical package id after the
> CPUID enumeration and yell when the APIC ids are different. Fix up the
> initial APIC id, which is fine as it is only used printout purposes.
>
> If the physical package IDs differ yell and use the package information
> from the ACPI/MP tables so the existing logical package map just works.
>
> Chas provided the resulting dmesg output for his affected 4 virtual
> sockets, 1 core per socket VM:
>
> [Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 CPUID: 2
> [Firmware Bug]: CPU1: Using firmware package id 1 instead of 2
> ....
>
> Reported-and-tested-by: "Charles (Chas) Williams" <ciwillia@xxxxxxxxxxx>,
> Reported-by: M. Vefa Bicakci <m.v.b@xxxxxxxxxx>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxxxx>
> Cc: Alok Kataria <akataria@xxxxxxxxxx>
> Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
> Cc: #4.6+ <stable@vger,kernel.org>
> Link: https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_alpine.DEB.2.20.1611091613540.3501-40nanos&d=CwIDaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=2AkLWShm6V8Nuu8ZZ-80Flo6y0XxCGmO1xrsAeRArAE&m=WBsB4JFr-Dct0um4Kf8QAxC7w6p-Mlk3H-LwItQJ7Fw&s=HNQMGUrw_s6Mc_oyREBnD4TrUjERbLcH1viAZr-aFPY&e=
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> ---
> arch/x86/kernel/cpu/common.c | 32 ++++++++++++++++++++++++++++++--
> 1 file changed, 30 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 9bd910a..cc9e980 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -979,6 +979,35 @@ static void x86_init_cache_qos(struct cpuinfo_x86 *c)
> }
>
> /*
> + * The physical to logical package id mapping is initialized from the
> + * acpi/mptables information. Make sure that CPUID actually agrees with
> + * that.
> + */
> +static void sanitize_package_id(struct cpuinfo_x86 *c)
> +{
> +#ifdef CONFIG_SMP
> + unsigned int pkg, apicid, cpu = smp_processor_id();
> +
> + apicid = apic->cpu_present_to_apicid(cpu);
> + pkg = apicid >> boot_cpu_data.x86_coreid_bits;
> +
> + if (apicid != c->initial_apicid) {
> + pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x CPUID: %x\n",
> + cpu, apicid, c->initial_apicid);
> + c->initial_apicid = apicid;
> + }
> + if (pkg != c->phys_proc_id) {
> + pr_err(FW_BUG "CPU%u: Using firmware package id %u instead of %u\n",
> + cpu, pkg, c->phys_proc_id);
> + c->phys_proc_id = pkg;
> + }
> + c->logical_proc_id = topology_phys_to_logical_pkg(pkg);
> +#else
> + c->logical_proc_id = 0;
> +#endif
> +}
> +
> +/*
> * This does the hard work of actually picking apart the CPU stuff...
> */
> static void identify_cpu(struct cpuinfo_x86 *c)
> @@ -1103,8 +1132,7 @@ static void identify_cpu(struct cpuinfo_x86 *c)
> #ifdef CONFIG_NUMA
> numa_add_cpu(smp_processor_id());
> #endif
> - /* The boot/hotplug time assigment got cleared, restore it */
> - c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);
> + sanitize_package_id(c);
> }
>
> /*