WARNING: at arch/x86/kernel/apic/apic.c:1239 setup_local_APIC+...

From: Borislav Petkov
Date: Fri Oct 14 2011 - 09:55:33 EST


Hi,

we're hitting the warning below

<arch/x86/kernel/apic/apic.c>:
...
i = early_per_cpu(x86_cpu_to_logical_apicid, cpu);
WARN_ON(i != BAD_APICID && i != logical_smp_processor_id()); <--- HERE
/* always use the value from LDR */
early_per_cpu(x86_cpu_to_logical_apicid, cpu) =
logical_smp_processor_id();

during 32-bit testing, config is attached.

acb8bc09c6185e4d3d582d0076aaa6a89f19d8c5 added this warning
and triggers on the second part of the condition. I've dumped
x86_cpu_to_logical_apicid and logical_smp_processor_id and here's what I
get on a 32-cores box:

...
[ 4.268493] CPU 31 irqstacks, hard=e5b1e000 soft=e5b20000
[ 4.269358] #31 Ok.
[ 4.270358] smpboot cpu 31: start_ip = 80000
[ 0.003999] Initializing CPU#31
[ 0.003999] ------------[ cut here ]------------
[ 0.003999] WARNING: at arch/x86/kernel/apic/apic.c:1239 setup_local_APIC+0x12c/0x3bd()
[ 0.003999] Hardware name: None
[ 0.003999] i: -2147483648, lg_smp_id: 79
[ 0.003999] Modules linked in:
[ 0.003999] Pid: 0, comm: kworker/0:1 Tainted: G W 3.1.0-rc9-37cf9516-linus+ #1
[ 0.003999] Call Trace:
[ 0.003999] [<c1035066>] warn_slowpath_common+0x65/0x7a
[ 0.003999] [<c10350ee>] warn_slowpath_fmt+0x2b/0x2f
[ 0.003999] [<c16603bb>] setup_local_APIC+0x12c/0x3bd
[ 0.003999] [<c165fcae>] start_secondary+0x9f/0x18f
[ 0.003999] ---[ end trace 4eaa2a86a8e2da41 ]---

That's right, x86_cpu_to_logical_apicid wraps around so if I were to
boot a 32-bit kernel on a 64 cores box, I'd get each two cores to have
the same logical apic id.

Oh, and logical_smp_processor_id(), i.e. logical APIC ID starts at 0x20,
acc. to SRAT:

[ 0.000000] SRAT: PXM 0 -> APIC 0x20 -> Node 0
[ 0.000000] SRAT: PXM 0 -> APIC 0x21 -> Node 0
[ 0.000000] SRAT: PXM 0 -> APIC 0x22 -> Node 0
[ 0.000000] SRAT: PXM 0 -> APIC 0x23 -> Node 0
[ 0.000000] SRAT: PXM 0 -> APIC 0x24 -> Node 0
[ 0.000000] SRAT: PXM 0 -> APIC 0x25 -> Node 0
[ 0.000000] SRAT: PXM 0 -> APIC 0x26 -> Node 0
[ 0.000000] SRAT: PXM 0 -> APIC 0x27 -> Node 0
[ 0.000000] SRAT: PXM 1 -> APIC 0x28 -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 0x29 -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 0x2a -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 0x2b -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 0x2c -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 0x2d -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 0x2e -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 0x2f -> Node 1
[ 0.000000] SRAT: PXM 2 -> APIC 0x40 -> Node 2
[ 0.000000] SRAT: PXM 2 -> APIC 0x41 -> Node 2
[ 0.000000] SRAT: PXM 2 -> APIC 0x42 -> Node 2
[ 0.000000] SRAT: PXM 2 -> APIC 0x43 -> Node 2
[ 0.000000] SRAT: PXM 2 -> APIC 0x44 -> Node 2
[ 0.000000] SRAT: PXM 2 -> APIC 0x45 -> Node 2
[ 0.000000] SRAT: PXM 2 -> APIC 0x46 -> Node 2
[ 0.000000] SRAT: PXM 2 -> APIC 0x47 -> Node 2
[ 0.000000] SRAT: PXM 3 -> APIC 0x48 -> Node 3
[ 0.000000] SRAT: PXM 3 -> APIC 0x49 -> Node 3
[ 0.000000] SRAT: PXM 3 -> APIC 0x4a -> Node 3
[ 0.000000] SRAT: PXM 3 -> APIC 0x4b -> Node 3
[ 0.000000] SRAT: PXM 3 -> APIC 0x4c -> Node 3
[ 0.000000] SRAT: PXM 3 -> APIC 0x4d -> Node 3
[ 0.000000] SRAT: PXM 3 -> APIC 0x4e -> Node 3
[ 0.000000] SRAT: PXM 3 -> APIC 0x4f -> Node 3

The warning triggers IMHO because we switch to bigsmp APIC in
default_setup_apic_routing() but the early x86_cpu_to_logical_apicid
enumeration comes from default_x86_32_early_logical_apicid()
which gets called as part of the default apic deal as part of
generic_processor_info().

So, in the end, and AFAICR, the warning triggers because we're comparing
logical APIC IDs from the APIC Logical Destination Register (0xD0) which
have been assigned by BIOS with "1 << cpu" shifted values which wrap on
32-bit.

I'd very much like to know why?

Thanks.

--
Regards/Gruss,
Boris.

Operating Systems Research Center
Advanced Micro Devices, Inc.

Attachment: config.gz
Description: Binary data