Re: unchecked MSR access error: WRMSR to 0xd84 (tried to write 0x0000000000010003) at rIP: 0xffffffffa025a1b8 (snbep_uncore_msr_init_box+0x38/0x60 [intel_uncore])

From: Borislav Petkov
Date: Tue Mar 05 2024 - 07:10:41 EST


On Tue, Mar 05, 2024 at 11:14:04AM +0100, Thomas Gleixner wrote:
> It seems that none of the consumers of topology_num_cores_per_package()
> can actually be used on virt, so a reasonable restriction is to reject
> non-present CPUs on bare metal. Something like the below.

Yeah, workie.

Reported-by: Borislav Petkov (AMD) <bp@xxxxxxxxx>
Tested-by: Borislav Petkov (AMD) <bp@xxxxxxxxx>

Some relevant diffs of dmesg before and after:

+ACPI: Ignoring non-present APIC ID on bare metal

-CPU topo: Num. cores per package: 16
-CPU topo: Num. threads per package: 32
-CPU topo: Allowing 8 present CPUs plus 24 hotplug CPUs
+CPU topo: Num. cores per package: 4
+CPU topo: Num. threads per package: 8
+CPU topo: Allowing 8 present CPUs plus 0 hotplug CPUs

-setup_percpu: NR_CPUS:256 nr_cpumask_bits:32 nr_cpu_ids:32 nr_node_ids:1
+setup_percpu: NR_CPUS:256 nr_cpumask_bits:8 nr_cpu_ids:8 nr_node_ids:1

-pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
-pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
-pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
-pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31
+pcpu-alloc: [0] 0 1 2 3 [0] 4 5 6 7

Those hotpluggable CPUs ended up wasting percpu mem too.

As a result, APIC is not in physical flat mode anymore:

-APIC: Switched APIC routing to: physical flat

I guess ship it but we'll pay attention to what else ends up
complaining.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette