Re: [CRASH][BISECTED] 6.4.1 crash in boot

From: Guenter Roeck
Date: Mon Jul 03 2023 - 00:38:58 EST


On 7/2/23 21:30, Kees Cook wrote:
On Mon, Jul 03, 2023 at 05:53:48AM +0200, Mirsad Goran Todorovac wrote:
On 7/3/23 05:26, Guenter Roeck wrote:
On 7/2/23 20:20, Kees Cook wrote:
On Mon, Jul 03, 2023 at 08:44:37AM +0700, Bagas Sanjaya wrote:
On Sun, Jul 02, 2023 at 06:36:12PM +0200, Mirsad Goran Todorovac wrote:
Hi,

After new git pull the kernel in Torvalds tree with default debug config
failed to boot with error that occurs prior to mounting filesystems, so there
is no log safe for the screenshot(s) here:

[1] https://domac.alu.unizg.hr/~mtodorov/linux/crashes/2023-07-02/

Bisect shows the first bad commit is 2d47c6956ab3 (v6.4-rc2-1-g2d47c6956ab3):

# good: [98be618ad03010b1173fc3c35f6cbb4447ee2b07] Merge tag 'Smack-for-6.5' of https://github.com/cschaufler/smack-next
git bisect good 98be618ad03010b1173fc3c35f6cbb4447ee2b07
# bad: [f4a0659f823e5a828ea2f45b4849ea8e2dd2984c] drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy
git bisect bad f4a0659f823e5a828ea2f45b4849ea8e2dd2984c
.
.
.
# bad: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC
git bisect bad 2d47c6956ab3c8b580a59d7704aab3e2a4882b6c
# first bad commit: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC

The architecture is Ubuntu 22.04 with lshw and config give in the attachment.

Can you show early kernel log (something like dmesg)?

Anyway, I'm adding it to regzbot:

#regzbot ^introduced: 2d47c6956ab3c8
#regzbot title: Linux kernel fails to boot due to UBSAN_BOUNDS tightening

I'm confused. Commit 2d47c6956ab3c8b580a59d7704aab3e2a4882b6c isn't in the v6.4
tree... it's only in Linus's ToT.


In ToT:

$ git describe 2d47c6956ab3
v6.4-rc2-1-g2d47c6956ab3

$ git describe --contains 2d47c6956ab3
next-20230616~2^2~51
$ git describe --contains --match 'v*' 2d47c6956ab3
fatal: cannot describe '2d47c6956ab3c8b580a59d7704aab3e2a4882b6c'

"git describe" always shows the parent tree, which I guess was based on
v6.4-rc2.

Guenter


Also, the config you included does not show CONFIG_UBSAN_BOUNDS_STRICT
as even being available, much less present. Something seems very wrong
with this report...

-Kees

Anyway, I have double checked and linux-image-6.4.0-rc2-crash boots while
linux-image-6.4.0-rc2-crash-00001-g2d47c6956ab3 freezes in early boot.

I don't understand what tree you're testing. 2d47c6956ab3 is only in
Linus's latest tree, which is not 6.4-rc2.


Maybe this ?

$ git checkout -b testing 2d47c6956ab3
Updating files: 100% (15501/15501), done.
Switched to a new branch 'testing'
groeck@server:~/src/linux-staging$ git describe
v6.4-rc2-1-g2d47c6956ab3

Guenter

If you're testing Linus's tree, and you're bisecting to 2d47c6956ab3,
I don't understand why the .config you sent doesn't include
CONFIG_UBSAN_BOUNDS_STRICT (which was introduced by that commit) --
it should be visible whether or not it is selected.

Of course, in the next boot dmesg appears overwritten ... I could provide
only the first screen screenshots.

Without CONFIG_UBSAN_TRAP, I would not expect anything other than a
warning (i.e. boot would continue).

The only other thing I can think of that seems related (the backtrace
appears to show usb), might be this:
https://lore.kernel.org/lkml/20230629190900.never.787-kees@xxxxxxxxxx/
which won't appears until after v6.5-rc1.

The difference is only one commit.

It is a bit strange so I am available for any additional diagnostics.

Thanks! Can you send "grep UBSAN .config" output for the crashing kernel?

Are you booting on an EFI-capable machine? If you could configure pstore
to use the EFI-vars backend, you can capture the crash in EFI and
pstorefs will show it after the next boot. (If you're using systemd,
this all may already be happening -- check /var/lib/systemd/pstore/
or see[1] for more details.)

-Kees

[1] https://www.freedesktop.org/software/systemd/man/systemd-pstore.service.html