Re: 6.6/regression/bisected - after commit a349d72fd9efc87c8fd1d16d3164752d84a7275b system stopped booting

From: Bagas Sanjaya
Date: Thu Aug 31 2023 - 19:58:38 EST


On Fri, Sep 01, 2023 at 03:45:28AM +0500, Mikhail Gavrilov wrote:
> Hi,
> next release cycle, and another regression.
> Yesterday after another kernel update in Fedora Rawhide system stopped booting.
> Today thanks to git bisect, I found out that this is a commit:
>
> ❯ git bisect bad
> a349d72fd9efc87c8fd1d16d3164752d84a7275b is the first bad commit
> commit a349d72fd9efc87c8fd1d16d3164752d84a7275b
> Author: Hugh Dickins <hughd@xxxxxxxxxx>
> Date: Tue Jul 11 21:30:40 2023 -0700
>
> mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s
>
> Patch series "mm: free retracted page table by RCU", v3.
>
> Some mmap_lock avoidance i.e. latency reduction. Initially just for the
> case of collapsing shmem or file pages to THPs: the usefulness of
> MADV_COLLAPSE on shmem is being limited by that mmap_write_lock it
> currently requires.
>
> Likely to be relied upon later in other contexts e.g. freeing of empty
> page tables (but that's not work I'm doing). mmap_write_lock avoidance
> when collapsing to anon THPs? Perhaps, but again that's not work I've
> done: a quick attempt was not as easy as the shmem/file case.
>
> These changes (though of course not these exact patches) have been in
> Google's data centre kernel for three years now: we do rely upon them.
>
>
> This patch (of 13):
>
> Before putting them to use (several commits later), add rcu_read_lock() to
> pte_offset_map(), and rcu_read_unlock() to pte_unmap(). Make this a
> separate commit, since it risks exposing imbalances: prior commits have
> fixed all the known imbalances, but we may find some have been missed.
>
> Link: https://lkml.kernel.org/r/7cd843a9-aa80-14f-5eb2-33427363c20@xxxxxxxxxx
> Link: https://lkml.kernel.org/r/d3b01da5-2a6-833c-6681-67a3e024a16f@xxxxxxxxxx
> Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>
> <long cc list omitted>...
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>
> include/linux/pgtable.h | 4 ++--
> mm/pgtable-generic.c | 4 ++--
> 2 files changed, 4 insertions(+), 4 deletions(-)
>
> It looks like the hang happens so early that when booting into a
> working kernel and running "journalctl -b -1" I see in the console the
> log of the previous kernel which was booted before the problematic
> kernel.
> Therefore, I apologize that I can't provide the kernel logs.
> I can provides only photos when backtrace appears on my monitor:
> Here we waiting: https://ibb.co/5xmm0BH
> And then I see backtrace: https://ibb.co/TLLGFNP
>
> Unfortunately I can't revert commit
> a349d72fd9efc87c8fd1d16d3164752d84a7275b for testing more fresh builds
> because of conflicts.
>
> My hardware: https://linux-hardware.org/?probe=dd5735f315
> I also attached kernel build config and full bisect log.
>

Thanks for the regression report. I'm adding it to regzbot:

#regzbot ^introduced: a349d72fd9efc8
#regzbot title: rcu_read_{lock,unlock}() causes unbootable system with backtrace

--
An old man doll... just what I always wanted! - Clara

Attachment: signature.asc
Description: PGP signature