Re: next: i386-boot: clang-nightly: failed - intermittently - BUG: unable to handle page fault for address: 000024c0

From: Nick Desaulniers
Date: Wed May 17 2023 - 14:12:10 EST


On Wed, May 17, 2023 at 8:21 AM Naresh Kamboju
<naresh.kamboju@xxxxxxxxxx> wrote:
>
> Linux next-20230517 build with clang nightly for i386 boot fails intermittently.

Keyword: intermittently. That will make tracking this down fun.

Our CI also hit a boot failure on tip/master with the same splat:
https://github.com/ClangBuiltLinux/continuous-integration2/actions/runs/4998374271/jobs/8957285746
Though the CI pulled down a SHA
0932447780e1f9a43bf68ef7fe3d9b41b46d58fc
which looks weird on
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=0932447780e1f9a43bf68ef7fe3d9b41b46d58fc
>> Notice: this object is not reachable from any branch.

That this failed in -next and -tip in the same way makes me wonder if
something affecting this is coming in via -tip? Maybe the splat looks
familiar to x86 folks?

I haven't been able to reproduce locally when my machine is relatively
load-less. If I do a kernel build in the background, I was able to
get QEMU to hang, but without any splat. That was using tip/master @
f81d8f759e7f.

Naresh, when you say "intermittent" do you have any data on the
relative frequency of this boot failure? (Also, please make sure to
use llvm@xxxxxxxxxxxxxxx in the future; we moved mailing lists years
ago).

Looks like our CI report linked above has an additional splat though
via apply_alternatives and optimize_nops.

>> [ 0.166742] Code: Unable to access opcode bytes at 0x36.

Peter, that smells like perhaps either:
commit b6c881b248ef ("x86/alternative: Complicate optimize_nops() some more")
commit 6c480f222128 ("x86/alternative: Rewrite optimize_nops() some")

Looks like BP committed them May 11; maybe just recently they were
merged into tip/master?


> - i386: boot/clang-nightly-lkftconfig - failed
>
> Please find detailed log here,
>
> boot log:
> =====
> [ 0.000000] Linux version 6.4.0-rc2-next-20230517 (tuxmake@tuxmake)
> (Debian clang version 17.0.0
> (++20230507093322+728b8a139804-1~exp1~20230507093335.652), Debian LLD
> 17.0.0) #1 SMP PREEMPT_DYNAMIC @1684291209
> <6>[ 0.000000] x86/fpu: x87 FPU will use FXSAVE
> <6>[ 0.000000] signal: max sigframe size: 1440
> <6>[ 0.000000] BIOS-provided physical RAM map:
> <6>[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> <6>[ 0.000000] BIOS-e820: [mem
> 0x000000000009fc00-0x000000000009ffff] reserved
> <6>[ 0.000000] BIOS-e820: [mem
> 0x00000000000f0000-0x00000000000fffff] reserved
> <6>[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007ffdefff] usable
> <6>[ 0.000000] BIOS-e820: [mem
> 0x000000007ffdf000-0x000000007fffffff] reserved
> <6>[ 0.000000] BIOS-e820: [mem
> 0x00000000b0000000-0x00000000bfffffff] reserved
> <6>[ 0.000000] BIOS-e820: [mem
> 0x00000000fed1c000-0x00000000fed1ffff] reserved
> <6>[ 0.000000] BIOS-e820: [mem
> 0x00000000fffc0000-0x00000000ffffffff] reserved
> <6>[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000017fffffff] usable
> <5>[ 0.000000] random: crng init done
> <5>[ 0.000000] Notice: NX (Execute Disable) protection cannot be
> enabled: non-PAE kernel!
> <7>[ 0.000000] e820: update [mem 0x00cfce20-0x00cfce2f] usable ==> usable
> <7>[ 0.000000] e820: update [mem 0x00cfce20-0x00cfce2f] usable ==> usable
> <6>[ 0.000000] extended physical RAM map:
> <6>[ 0.000000] reserve setup_data: [mem
> 0x0000000000000000-0x000000000009fbff] usable
> <6>[ 0.000000] reserve setup_data: [mem
> 0x000000000009fc00-0x000000000009ffff] reserved
> <6>[ 0.000000] reserve setup_data: [mem
> 0x00000000000f0000-0x00000000000fffff] reserved
> <6>[ 0.000000] reserve setup_data: [mem
> 0x0000000000100000-0x0000000000cfce1f] usable
> <6>[ 0.000000] reserve setup_data: [mem
> 0x0000000000cfce20-0x0000000000cfce2f] usable
> <6>[ 0.000000] reserve setup_data: [mem
> 0x0000000000cfce30-0x000000007ffdefff] usable
> <6>[ 0.000000] reserve setup_data: [mem
> 0x000000007ffdf000-0x000000007fffffff] reserved
> <6>[ 0.000000] reserve setup_data: [mem
> 0x00000000b0000000-0x00000000bfffffff] reserved
> <6>[ 0.000000] reserve setup_data: [mem
> 0x00000000fed1c000-0x00000000fed1ffff] reserved
> <6>[ 0.000000] reserve setup_data: [mem
> 0x00000000fffc0000-0x00000000ffffffff] reserved
> <6>[ 0.000000] reserve setup_data: [mem
> 0x0000000100000000-0x000000017fffffff] usable
> <6>[ 0.000000] SMBIOS 2.8 present.
> <6>[ 0.000000] DMI: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> 1.14.0-2 04/01/2014
> <6>[ 0.000000] tsc: Fast TSC calibration using PIT
> <6>[ 0.000000] tsc: Detected 3000.002 MHz processor
> <7>[ 0.009140] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
> <7>[ 0.009369] e820: remove [mem 0x000a0000-0x000fffff] usable
> <6>[ 0.009688] last_pfn = 0x7ffdf max_arch_pfn = 0x100000
> <6>[ 0.010313] x86/PAT: PAT not supported by the CPU.
> <6>[ 0.010476] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC
> <6>[ 0.028035] found SMP MP-table at [mem 0x000f5ce0-0x000f5cef]
> <7>[ 0.029050] initial memory mapped: [mem 0x00000000-0x18ffffff]
> <6>[ 0.032814] ACPI: Early table checksum verification disabled
> <6>[ 0.033316] ACPI: RSDP 0x00000000000F5B10 000014 (v00 BOCHS )
> <6>[ 0.033820] ACPI: RSDT 0x000000007FFE2319 000038 (v01 BOCHS
> BXPC 00000001 BXPC 00000001)
> <6>[ 0.034477] ACPI: FACP 0x000000007FFE2109 0000F4 (v03 BOCHS
> BXPC 00000001 BXPC 00000001)
> <6>[ 0.035093] ACPI: DSDT 0x000000007FFE0040 0020C9 (v01 BOCHS
> BXPC 00000001 BXPC 00000001)
> <6>[ 0.035191] ACPI: FACS 0x000000007FFE0000 000040
> <6>[ 0.035258] ACPI: APIC 0x000000007FFE21FD 000080 (v01 BOCHS
> BXPC 00000001 BXPC 00000001)
> <6>[ 0.035306] ACPI: HPET 0x000000007FFE227D 000038 (v01 BOCHS
> BXPC 00000001 BXPC 00000001)
> <6>[ 0.035336] ACPI: MCFG 0x000000007FFE22B5 00003C (v01 BOCHS
> BXPC 00000001 BXPC 00000001)
> <6>[ 0.035365] ACPI: WAET 0x000000007FFE22F1 000028 (v01 BOCHS
> BXPC 00000001 BXPC 00000001)
> <6>[ 0.035470] ACPI: Reserving FACP table memory at [mem
> 0x7ffe2109-0x7ffe21fc]
> <6>[ 0.035522] ACPI: Reserving DSDT table memory at [mem
> 0x7ffe0040-0x7ffe2108]
> <6>[ 0.035533] ACPI: Reserving FACS table memory at [mem
> 0x7ffe0000-0x7ffe003f]
> <6>[ 0.035542] ACPI: Reserving APIC table memory at [mem
> 0x7ffe21fd-0x7ffe227c]
> <6>[ 0.035550] ACPI: Reserving HPET table memory at [mem
> 0x7ffe227d-0x7ffe22b4]
> <6>[ 0.035558] ACPI: Reserving MCFG table memory at [mem
> 0x7ffe22b5-0x7ffe22f0]
> <6>[ 0.035566] ACPI: Reserving WAET table memory at [mem
> 0x7ffe22f1-0x7ffe2318]
> <5>[ 0.036775] 1167MB HIGHMEM available.
> <5>[ 0.036823] 879MB LOWMEM available.
> <6>[ 0.036848] mapped low ram: 0 - 36ffe000
> <6>[ 0.036864] low ram: 0 - 36ffe000
> <6>[ 0.037465] Zone ranges:
> <6>[ 0.037490] DMA [mem 0x0000000000001000-0x0000000000ffffff]
> <6>[ 0.037576] Normal [mem 0x0000000001000000-0x0000000036ffdfff]
> <6>[ 0.037591] HighMem [mem 0x0000000036ffe000-0x000000007ffdefff]
> <6>[ 0.037610] Movable zone start for each node
> <6>[ 0.037625] Early memory node ranges
> <6>[ 0.037668] node 0: [mem 0x0000000000001000-0x000000000009efff]
> <6>[ 0.037741] node 0: [mem 0x0000000000100000-0x000000007ffdefff]
> <6>[ 0.037912] Initmem setup node 0 [mem
> 0x0000000000001000-0x000000007ffdefff]
> <6>[ 0.039699] On node 0, zone DMA: 1 pages in unavailable ranges
> <6>[ 0.040493] On node 0, zone DMA: 97 pages in unavailable ranges
> <6>[ 0.068623] Using APIC driver default
> <6>[ 0.069096] ACPI: PM-Timer IO Port: 0x608
> <6>[ 0.069764] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
> <6>[ 0.070159] IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23
> <6>[ 0.070278] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> <6>[ 0.070576] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
> <6>[ 0.070611] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
> <6>[ 0.070722] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
> <6>[ 0.070735] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
> <6>[ 0.070968] ACPI: Using ACPI (MADT) for SMP configuration information
> <6>[ 0.071240] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
> <6>[ 0.072248] PM: hibernation: Registered nosave memory: [mem
> 0x00000000-0x00000fff]
> <6>[ 0.072334] PM: hibernation: Registered nosave memory: [mem
> 0x0009f000-0x0009ffff]
> <6>[ 0.072384] PM: hibernation: Registered nosave memory: [mem
> 0x000a0000-0x000effff]
> <6>[ 0.072394] PM: hibernation: Registered nosave memory: [mem
> 0x000f0000-0x000fffff]
> <6>[ 0.072405] PM: hibernation: Registered nosave memory: [mem
> 0x00cfc000-0x00cfcfff]
> <6>[ 0.072417] PM: hibernation: Registered nosave memory: [mem
> 0x00cfc000-0x00cfcfff]
> <6>[ 0.072636] [mem 0xc0000000-0xfed1bfff] available for PCI devices
> <6>[ 0.072683] Booting paravirtualized kernel on bare hardware
> <6>[ 0.073201] clocksource: refined-jiffies: mask: 0xffffffff
> max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
> <6>[ 0.073674] setup_percpu: NR_CPUS:8 nr_cpumask_bits:2
> nr_cpu_ids:2 nr_node_ids:1
> <6>[ 0.075154] percpu: Embedded 31 pages/cpu s96372 r0 d30604 u126976
> <7>[ 0.075652] pcpu-alloc: s96372 r0 d30604 u126976 alloc=31*4096
> <7>[ 0.075742] pcpu-alloc: [0] 0 [0] 1
> <5>[ 0.077147] Kernel command line: console=ttyS0,115200 rootwait
> root=/dev/sda debug verbose console_msg_format=syslog earlycon
> <5>[ 0.078730] Unknown kernel command line parameters \"verbose\",
> will be passed to user space.
> <6>[ 0.080340] Dentry cache hash table entries: 131072 (order: 7,
> 524288 bytes, linear)
> <6>[ 0.081003] Inode-cache hash table entries: 65536 (order: 6,
> 262144 bytes, linear)
> <6>[ 0.082840] Built 1 zonelists, mobility grouping on. Total pages: 521957
> <6>[ 0.088082] mem auto-init: stack:all(zero), heap alloc:off, heap free:off
> <6>[ 0.088308] Initializing HighMem for node 0 (00036ffe:0007ffdf)
> <6>[ 0.454727] Initializing Movable for node 0 (00000000:00000000)
> <6>[ 0.459790] Checking if this processor honours the WP bit even
> in supervisor mode...Ok.
> <6>[ 0.460945] Memory: 2046372K/2096628K available (15688K kernel
> code, 2173K rwdata, 5772K rodata, 996K init, 780K bss, 50256K
> reserved, 0K cma-reserved, 1195908K highmem)
> <6>[ 0.466051] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
> <6>[ 0.468048] ftrace: allocating 54241 entries in 106 pages
> <6>[ 0.511020] ftrace: allocated 106 pages with 4 groups
> <6>[ 0.514727] trace event string verifier disabled
> <6>[ 0.519501] Dynamic Preempt: voluntary
> <6>[ 0.522734] rcu: Preemptible hierarchical RCU implementation.
> <6>[ 0.522756] rcu: RCU event tracing is enabled.
> <6>[ 0.522785] rcu: RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=2.
> <6>[ 0.522916] Trampoline variant of Tasks RCU enabled.
> <6>[ 0.522927] Rude variant of Tasks RCU enabled.
> <6>[ 0.523017] rcu: RCU calculated value of scheduler-enlistment
> delay is 100 jiffies.
> <6>[ 0.523045] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
> <6>[ 0.538503] NR_IRQS: 2304, nr_irqs: 440, preallocated irqs: 16
> <6>[ 0.543029] rcu: srcu_init: Setting srcu_struct sizes based on contention.
> <6>[ 0.546346] kfence: initialized - using 2097152 bytes for 255
> objects at 0x(ptrval)-0x(ptrval)
> <6>[ 0.551378] Console: colour VGA+ 80x25
> <6>[ 0.553103] printk: console [ttyS0] enabled
> <6>[ 0.626471] ACPI: Core revision 20230331
> <6>[ 0.637882] APIC: Switch to symmetric I/O mode setup
> <6>[ 0.638348] Enabling APIC mode: Flat. Using 1 I/O APICs
> <6>[ 0.644118] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> <6>[ 0.650790] clocksource: tsc-early: mask: 0xffffffffffffffff
> max_cycles: 0x2b3e46fa93c, max_idle_ns: 440795237232 ns
> <6>[ 0.652683] Calibrating delay loop (skipped), value calculated
> using timer frequency.. 6000.00 BogoMIPS (lpj=3000002)
> <6>[ 0.654599] pid_max: default: 32768 minimum: 301
> <6>[ 0.656733] LSM: initializing lsm=capability,selinux,integrity
> <6>[ 0.657713] SELinux: Initializing.
> <6>[ 0.660732] Mount-cache hash table entries: 2048 (order: 1, 8192
> bytes, linear)
> <6>[ 0.661233] Mountpoint-cache hash table entries: 2048 (order: 1,
> 8192 bytes, linear)
> <6>[ 0.682197] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
> <6>[ 0.682458] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
> <6>[ 0.682670] Spectre V1 : Mitigation: usercopy/swapgs barriers
> and __user pointer sanitization
> <6>[ 0.684632] Spectre V2 : Mitigation: Retpolines
> <6>[ 0.684930] Spectre V2 : Spectre v2 / SpectreRSB mitigation:
> Filling RSB on context switch
> <6>[ 0.685539] Spectre V2 : Spectre v2 / SpectreRSB : Filling RSB on VMEXIT
> <4>[ 0.686934] L1TF: Kernel not compiled for PAE. No mitigation for L1TF
> <6>[ 0.688097] MDS: Vulnerable: Clear CPU buffers attempted, no microcode
> <6>[ 0.688548] MMIO Stale Data: Unknown: No mitigations
> <0>[ 0.692458] __common_interrupt: 0.79 No irq handler for vector
> <0>[ 0.692458] __common_interrupt: 0.160 No irq handler for vector
> <1>[ 0.692458] BUG: unable to handle page fault for address: 000024c0
> <1>[ 0.692458] #PF: user read access in kernel mode
> <1>[ 0.692458] #PF: error_code(0x0004) - not-present page
> <6>[ 0.692458] *pde = 00000000
> <4>[ 0.692458] Oops: 0004 [#1] PREEMPT SMP
> <4>[ 0.692458] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> 6.4.0-rc2-next-20230517 #1
> <4>[ 0.692458] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS 1.14.0-2 04/01/2014
> <4>[ 0.692458] EIP: 0x60
> <4>[ 0.692458] Code: Unable to access opcode bytes at 0x36.
> <4>[ 0.692458] EAX: 00000060 EBX: 00000000 ECX: b51e0945 EDX: 006afcf0
> <4>[ 0.692458] ESI: 00000060 EDI: 00000020 EBP: d86f7c04 ESP: d86f7ce0
> <4>[ 0.692458] DS: 0000 ES: 0000 FS: 0000 GS: 0000 SS: 7c8c EFLAGS: 00324b92
> <4>[ 0.692458] CR0: 80050033 CR2: 000024c0 CR3: 18a49000 CR4: 000006d0
> <4>[ 0.692458] Call Trace:
> <4>[ 0.692458] ? vscnprintf+0x12/0x30
> <4>[ 0.692458] ? prb_read_valid+0x31/0x50
> <4>[ 0.692458] ? console_unlock+0x94/0xc0
> <4>[ 0.692458] ? vprintk_emit+0xf4/0x230
> <4>[ 0.692458] ? vprintk_emit+0xf4/0x230
> <4>[ 0.692458] ? vprintk_default+0x15/0x20
> <4>[ 0.692458] ? __is_insn_slot_addr+0x41/0x50
> <4>[ 0.692458] ? kernel_text_address+0x7c/0x90
> <4>[ 0.692458] ? __kernel_text_address+0x10/0x40
> <4>[ 0.692458] ? show_trace_log_lvl.17+0x1f5/0x2f0
> <4>[ 0.692458] ? __die_body+0x54/0x90
> <4>[ 0.692458] ? __die+0x70/0x80
> <4>[ 0.692458] ? page_fault_oops+0x2d3/0x310
> <4>[ 0.692458] ? kernelmode_fixup_or_oops+0xa5/0xc0
> <4>[ 0.692458] ? __bad_area_nosemaphore+0x59/0x1d0
> <4>[ 0.692458] ? bad_area_nosemaphore+0x12/0x20
> <4>[ 0.692458] ? do_user_addr_fault+0x156/0x400
> <4>[ 0.692458] ? irqentry_enter+0x32/0x38
> <4>[ 0.692458] ? trace_hardirqs_off_finish+0x2c/0x70
> <4>[ 0.692458] ? exc_page_fault+0x50/0x154
> <4>[ 0.692458] ? pvclock_clocksource_read_nowd+0xdc/0xdc
> <4>[ 0.692458] ? handle_exception+0x133/0x133
> <4>[ 0.692458] ? pvclock_clocksource_read_nowd+0xdc/0xdc
> <4>[ 0.692458] Modules linked in:
> <4>[ 0.692458] CR2: 00000000000024c0
> <4>[ 0.692458] ---[ end trace 0000000000000000 ]---
> <4>[ 0.692458] EIP: 0x60
> <4>[ 0.692458] Code: Unable to access opcode bytes at 0x36.
> <4>[ 0.692458] EAX: 00000060 EBX: 00000000 ECX: b51e0945 EDX: 006afcf0
> <4>[ 0.692458] ESI: 00000060 EDI: 00000020 EBP: d86f7c04 ESP: d86f7ce0
> <4>[ 0.692458] DS: 0000 ES: 0000 FS: 0000 GS: 0000 SS: 7c8c EFLAGS: 00324b92
> <4>[ 0.692458] CR0: 80050033 CR2: 000024c0 CR3: 18a49000 CR4: 000006d0
> <0>[ 0.692458] Kernel panic - not syncing: Attempted to kill the idle task!
> <0>[ 0.692458] ---[ end Kernel panic - not syncing: Attempted to
> kill the idle task! ]---
> poweroff
>
>
> logs:
> ===
> - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230517/testrun/17029757/suite/boot/test/clang-nightly-lkftconfig/log
> - https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/tests/2PtyoY5qEJrKsuIBPRVSIGK02m3
>
>
> Steps to reproduce intermittently,
> =============
> # To install tuxrun on your system globally:
> # sudo pip3 install -U tuxrun==0.42.0
> #
> # See https://tuxrun.org/ for complete documentation.
>
> tuxrun \
> --runtime podman \
> --device qemu-i386 \
> --kernel https://storage.tuxsuite.com/public/linaro/lkft/builds/2PtylXBvODt99O8r9YKOEwj0fQc/bzImage
> \
> --modules https://storage.tuxsuite.com/public/linaro/lkft/builds/2PtylXBvODt99O8r9YKOEwj0fQc/modules.tar.xz
> \
> --rootfs https://storage.tuxsuite.com/public/linaro/lkft/oebuilds/2PeQhkn36JqPowQBZge0AFhBav6/images/intel-core2-32/lkft-tux-image-intel-core2-32-20230511145016.rootfs.ext4.gz
> \
> --parameters SKIPFILE=skipfile-lkft.yaml \
> --parameters SHARD_NUMBER=10 \
> --parameters SHARD_INDEX=2 \
> --image docker.io/lavasoftware/lava-dispatcher:2023.01.0020.gc1598238f \
> --tests ltp-syscalls \
> --timeouts boot=15 ltp-syscalls=50
>
> --
> Linaro LKFT
> https://lkft.linaro.org



--
Thanks,
~Nick Desaulniers