Re: [syzbot] [mm?] WARNING in unmap_page_range (2)

From: Peter Xu
Date: Thu Nov 16 2023 - 13:00:45 EST


On Thu, Nov 16, 2023 at 10:19:13AM +0100, David Hildenbrand wrote:
> On 15.11.23 23:00, Andrew Morton wrote:
> > On Wed, 15 Nov 2023 05:32:19 -0800 syzbot <syzbot+7ca4b2719dc742b8d0a4@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > > Hello,
> > >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit: ac347a0655db Merge tag 'arm64-fixes' of git://git.kernel.o..
> > > git tree: upstream
> > > console+strace: https://syzkaller.appspot.com/x/log.txt?x=15ff3057680000
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=287570229f5c0a7c
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=7ca4b2719dc742b8d0a4
> > > compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=162a25ff680000
> > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=13d62338e80000
> > >
> > > Downloadable assets:
> > > disk image: https://storage.googleapis.com/syzbot-assets/00e30e1a5133/disk-ac347a06.raw.xz
> > > vmlinux: https://storage.googleapis.com/syzbot-assets/07c43bc37935/vmlinux-ac347a06.xz
> > > kernel image: https://storage.googleapis.com/syzbot-assets/c6690c715398/bzImage-ac347a06.xz
> > >
> > > The issue was bisected to:
> > >
> > > commit 12f6b01a0bcbeeab8cc9305673314adb3adf80f7
> > > Author: Muhammad Usama Anjum <usama.anjum@xxxxxxxxxxxxx>
> > > Date: Mon Aug 21 14:15:15 2023 +0000
> > >
> > > fs/proc/task_mmu: add fast paths to get/clear PAGE_IS_WRITTEN flag
> >
> > Thanks. The bisection is surprising, but the mentioned patch does
> > mess with pagemap.
> >
> > How about we add this?
> >
> > From: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> > Subject: mm/memory.c:zap_pte_range() print bad swap entry
> > Date: Wed Nov 15 01:54:18 PM PST 2023
> >
> > We have a report of this WARN() triggering. Let's print the offending
> > swp_entry_t to help diagnosis.
> >
> > Link: https://lkml.kernel.org/r/000000000000b0e576060a30ee3b@xxxxxxxxxx
> > Cc: Muhammad Usama Anjum <usama.anjum@xxxxxxxxxxxxx>
> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> > ---
> >
> > mm/memory.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > --- a/mm/memory.c~a
> > +++ a/mm/memory.c
> > @@ -1521,6 +1521,7 @@ static unsigned long zap_pte_range(struc
> > continue;
> > } else {
> > /* We should have covered all the swap entry types */
> > + pr_alert("unrecognized swap entry 0x%lx\n", entry.val);
> > WARN_ON_ONCE(1);
> > }
> > pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
> > _
> >
>
> I'm curious if
>
> 1) make_uffd_wp_pte() won't end up overwriting existing pte markers, for
> example, if PTE_MARKER_POISONED is set. [unrelated to this bug]

It should be fine, as:

static void make_uffd_wp_pte(struct vm_area_struct *vma,
unsigned long addr, pte_t *pte)
{
pte_t ptent = ptep_get(pte);

#ifndef CONFIG_USERFAULTFD_

if (pte_present(ptent)) {
pte_t old_pte;

old_pte = ptep_modify_prot_start(vma, addr, pte);
ptent = pte_mkuffd_wp(ptent);
ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent);
} else if (is_swap_pte(ptent)) {
ptent = pte_swp_mkuffd_wp(ptent);
set_pte_at(vma->vm_mm, addr, pte, ptent);
} else { <----------------- this must be pte_none() already
set_pte_at(vma->vm_mm, addr, pte,
make_pte_marker(PTE_MARKER_UFFD_WP));
}
}

>
> 2) We get the error on arm64, which does *not* support uffd-wp. Do we
> maybe end up calling make_uffd_wp_pte() and place a pte marker, even
> though we don't have CONFIG_PTE_MARKER_UFFD_WP?
>
>
> static inline bool pte_marker_entry_uffd_wp(swp_entry_t entry)
> {
> #ifdef CONFIG_PTE_MARKER_UFFD_WP
> return is_pte_marker_entry(entry) &&
> (pte_marker_get(entry) & PTE_MARKER_UFFD_WP);
> #else
> return false;
> #endif
> }
>
> Will always return false without CONFIG_PTE_MARKER_UFFD_WP.
>
> But make_uffd_wp_pte() might just happily place an entry. Hm.
>
>
> The following might fix the problem:
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 51e0ec658457..ae1cf19918d3 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1830,8 +1830,10 @@ static void make_uffd_wp_pte(struct vm_area_struct
> *vma,
> ptent = pte_swp_mkuffd_wp(ptent);
> set_pte_at(vma->vm_mm, addr, pte, ptent);
> } else {
> +#ifdef CONFIG_PTE_MARKER_UFFD_WP
> set_pte_at(vma->vm_mm, addr, pte,
> make_pte_marker(PTE_MARKER_UFFD_WP));
> +#endif
> }
> }

I'd like to double check with Muhammad (as I didn't actually follow his
work in the latest versions.. quite a lot changed), but I _think_
fundamentally we missed something important in the fast path, and I think
it applies even to archs that support uffd..

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index e91085d79926..3b81baabd22a 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -2171,7 +2171,8 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigned long start,
return 0;
}

- if (!p->vec_out) {
+ if (!p->vec_out &&
+ (p->arg.flags & PM_SCAN_WP_MATCHING))
/* Fast path for performing exclusive WP */
for (addr = start; addr != end; pte++, addr += PAGE_SIZE) {
if (pte_uffd_wp(ptep_get(pte)))

There's yet another report in fs list that triggers other issues:

https://lore.kernel.org/all/000000000000773fa7060a31e2cc@xxxxxxxxxx/

I'll think over that and I plan to prepare a small patchset to fix all I
saw.

Thanks,

--
Peter Xu