x86 boot issues in -next

From: Mark Brown
Date: Fri Mar 01 2024 - 16:29:57 EST


Hi,

For the past few days -next has been failing to boot an x86_64 defconfig
on the x86 machine Linaro has available in their lab. DMI says it's a
"Dell Inc. PowerEdge R200/0TY019, BIOS 1.4.3 05/15/2009" and the CPU is
described as "Intel(R) Xeon(R) CPU X3220 @ 2.40GHz (family: 0x6, model:
0xf, stepping: 0xb)", it's running happily with mainline and
pending-fixes.

The kernel crashes with:

[ 2.012730] PCI: CLS 64 bytes, default 64
[ 2.016743] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[ 2.023181] software IO TLB: mapped [mem 0x00000000cbeb2000-0x00000000cfeb2000] (64MB)
[ 2.032236] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 2.032914] #PF: supervisor read access in kernel mode
[ 2.032914] #PF: error_code(0x0000) - not-present page
[ 2.032914] PGD 0 P4D 0
[ 2.032914] Oops: 0000 [#1] PREEMPT SMP PTI
[ 2.032914] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc6-next-20240229 #1
[ 2.032914] Hardware name: Dell Inc. PowerEdge R200/0TY019, BIOS 1.4.3 05/15/2009
[ 2.032914] RIP: 0010:exra_is_visible+0xf/0x20
[ 2.032914] Code: b7 46 08 c3 cc cc cc cc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 31 c0 83 3d 1b f0 c9 01 01 7e 04 <0f> b7 46 08 c3 cc cc cc cc 0f 1f 84 00 00 00 00 00 90 90 90 90 90
[ 2.032914] RSP: 0000:ffffac7cc001bdd0 EFLAGS: 00010202
[ 2.032914] RAX: 0000000000000000 RBX: ffffffff90812ea0 RCX: ffffffff8ee0ef00
[ 2.032914] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff97a681064c00
[ 2.032914] RBP: 0000000000000001 R08: 0000000000000228 R09: ffff97a680127310
[ 2.032914] R10: ffff97a68049ddd0 R11: 0000000000000000 R12: ffff97a681064c00
[ 2.032914] R13: ffffffff90812dc0 R14: 0000000000000001 R15: 0000000000000000
[ 2.032914] FS: 0000000000000000(0000) GS:ffff97a7a7c00000(0000) knlGS:0000000000000000
[ 2.032914] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.032914] CR2: 0000000000000008 CR3: 00000000bf42e000 CR4: 00000000000006f0
[ 2.032914] Call Trace:
[ 2.032914] <TASK>
[ 2.032914] ? __die+0x1e/0x60
[ 2.032914] ? page_fault_oops+0x17b/0x480
[ 2.032914] ? search_module_extables+0x14/0x50
[ 2.032914] ? exc_page_fault+0x6b/0x150
[ 2.032914] ? asm_exc_page_fault+0x26/0x30
[ 2.032914] ? __pfx_exra_is_visible+0x10/0x10
[ 2.032914] ? exra_is_visible+0xf/0x20
[ 2.032914] internal_create_group+0x9c/0x400
[ 2.032914] internal_create_groups+0x3d/0xa0
[ 2.032914] pmu_dev_alloc+0xbb/0xe0
[ 2.032914] perf_event_sysfs_init+0x51/0xa0

A full boot log for a sample failure can be seen at:

https://validation.linaro.org/scheduler/job/4045256

I bisected this, the bisect seemed to run smoothly and landed on commit
f031242dbf22 ("Merge branch into tip/master: 'x86/apic'") with both
parents being fine (full log below) - an issue in the x86 tree does seem
plausible but I haven't investigated further.

git bisect start
# bad: [1870cdc0e8dee32e3c221704a2977898ba4c10e8] Add linux-next specific files for 20240301
git bisect bad 1870cdc0e8dee32e3c221704a2977898ba4c10e8
# good: [d1e87c1d8f90f27a1ca3c90d9de048602beabc61] Merge branch 'for-linux-next-fixes' of git://anongit.freedesktop.org/drm/drm-misc
git bisect good d1e87c1d8f90f27a1ca3c90d9de048602beabc61
# good: [907d374fa897fbbcdce1e027297d933bbab025e1] Merge branch 'main' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
git bisect good 907d374fa897fbbcdce1e027297d933bbab025e1
# good: [c32a1272f32bf6189357816f510edf7411ecd0ba] Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
git bisect good c32a1272f32bf6189357816f510edf7411ecd0ba
# bad: [2faf0484495c5288200f710966ec252fee2415a9] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt.git
git bisect bad 2faf0484495c5288200f710966ec252fee2415a9
# bad: [0fa7bf8bf39eff1d304d2d46c92d94131205036f] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
git bisect bad 0fa7bf8bf39eff1d304d2d46c92d94131205036f
# bad: [f031242dbf22fc9c850946253324c72611a8b253] Merge branch into tip/master: 'x86/apic'
git bisect bad f031242dbf22fc9c850946253324c72611a8b253
# good: [f631c66c30faa6900e05aec06e7d0a541040dbde] Merge branch into tip/master: 'irq/msi'
git bisect good f631c66c30faa6900e05aec06e7d0a541040dbde
# good: [7f204eefe00c7a6677fc1dd515a02eb8b9c57495] Merge branch into tip/master: 'timers/core'
git bisect good 7f204eefe00c7a6677fc1dd515a02eb8b9c57495
# good: [c0a66c2847908e41c771ca2355fba935a82a9f62] x86/cpu/topology: Move registration out of APIC code
git bisect good c0a66c2847908e41c771ca2355fba935a82a9f62
# good: [882e0cff9ef340e7a47659a9aab9da64f4b9b847] x86/cpu/topology: Mop up primary thread mask handling
git bisect good 882e0cff9ef340e7a47659a9aab9da64f4b9b847
# good: [6be4ec29685c216ebec61d35f56c3808092498aa] x86/apic: Build the x86 topology enumeration functions on UP APIC builds too
git bisect good 6be4ec29685c216ebec61d35f56c3808092498aa
# good: [9be3b2f057d7a6752e8cf25c1d456198b4d3bd6a] ptp/kvm, arm_arch_timer: Set system_counterval_t.cs_id to constant
git bisect good 9be3b2f057d7a6752e8cf25c1d456198b4d3bd6a
# good: [27f6a9c87a97f5ea7459be08d5be231af6b32c20] kvmclock: Unexport kvmclock clocksource
git bisect good 27f6a9c87a97f5ea7459be08d5be231af6b32c20
# good: [9b9c280b9af2aa851d83e7d0b79f36a3d869d745] Merge branch 'x86/urgent' into x86/apic, to resolve conflicts
git bisect good 9b9c280b9af2aa851d83e7d0b79f36a3d869d745
# good: [670c000c11ec5c5131cdf502d062075a803214af] Merge branch into tip/master: 'timers/ptp'
git bisect good 670c000c11ec5c5131cdf502d062075a803214af
# first bad commit: [f031242dbf22fc9c850946253324c72611a8b253] Merge branch into tip/master: 'x86/apic'

Attachment: signature.asc
Description: PGP signature