Re: [PATCH next] mm/khugepaged: fix conflicting mods to collapse_file()

From: Andrew Morton
Date: Mon Apr 24 2023 - 18:08:39 EST


On Sat, 22 Apr 2023 21:47:20 -0700 (PDT) Hugh Dickins <hughd@xxxxxxxxxx> wrote:

> Inserting Ivan Orlov's syzbot fix commit 2ce0bdfebc74
> ("mm: khugepaged: fix kernel BUG in hpage_collapse_scan_file()")
> ahead of Jiaqi Yan's and David Stevens's commits
> 12904d953364 ("mm/khugepaged: recover from poisoned file-backed memory")
> cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
> ac492b9c70ca ("mm/khugepaged: skip shmem with userfaultfd")
> (all of which restructure collapse_file()) did not work out well.
>
> xfstests generic/086 on huge tmpfs (with accelerated khugepaged) freezes
> (if not on the first attempt, then the 2nd or 3rd) in find_lock_entries()
> while doing drop_caches: the file's xarray seems to have been corrupted,
> with find_get_entry() returning nonsense which makes no progress.
>
> Bisection led to ac492b9c70ca; and diff against earlier working linux-next
> suggested that it's probably down to an errant xas_store(), which does not
> belong with the later changes (and nor does the positioning of warnings).
> The later changes look as if they fix the syzbot issue independently.
>
> Remove most of what's left of 2ce0bdfebc74: just leave one WARN_ON_ONCE
> (xas_error) after the final xas_store() of the multi-index entry.
>

Sigh. Thanks. I thought I'd successfully sorted that mess out.