Re: [mm] 408579cd62: WARNING:suspicious_RCU_usage

From: Liam R. Howlett
Date: Tue Jul 04 2023 - 11:28:02 EST


* Oliver Sang <oliver.sang@xxxxxxxxx> [230704 09:51]:
> hi, Linus,
>
> On Mon, Jul 03, 2023 at 07:29:48PM -0700, Linus Torvalds wrote:
> > On Mon, 3 Jul 2023 at 18:48, Oliver Sang <oliver.sang@xxxxxxxxx> wrote:
> > >
> > > by patch [1], we found the warning is not fixed.
> >
> > Hmm. I already committed that "fix" as obvious, since the main
> > difference in commit 408579cd627a ("mm: Update do_vmi_align_munmap()
> > return semantics") around that validate_mm() call was how it did that
> > mmap_read_unlock().
> >
> > > we also found there are some changes in stack backtrace. now it's as below:
> > > (detail dmesg is attached)
> > >
> > > [ 26.412372][ T1] stack backtrace:
> > > [ 26.412846][ T1] CPU: 0 PID: 1 Comm: systemd Not tainted 6.4.0-09908-gcb226fb1fb7a #1
> > > [ 26.413506][ T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > [ 26.414326][ T1] Call Trace:
> > > [ 26.414605][ T1] <TASK>
> > > [ 26.414847][ T1] dump_stack_lvl+0x73/0xc0
> > > [ 26.415225][ T1] lockdep_rcu_suspicious+0x1b7/0x280
> > > [ 26.415669][ T1] mas_start+0x280/0x400
> > > [ 26.416037][ T1] mas_find+0x27a/0x400
> > > [ 26.416391][ T1] validate_mm+0x8b/0x2c0
> > > [ 26.416757][ T1] __se_sys_brk+0xa35/0xc00
> >
> > Ok, that is indeed a very different stack trace.
> >
> > So maybe the fix is a real fix, but the first complaint shut up
> > lockdep, so this is the *second* and unrelated complaint.
> >
> > And indeed: it turns out that do_vma_munmap() does this:
> >
> > ret = do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
> > validate_mm(mm);
> >
> > and so we have *another* validate_mm() that is now done outside the lock.
> >
> > That one is actually pretty pointless. We've *just* validated the mm
> > already inside do_vmi_align_munmap(), except we only did it in one of
> > the two return cases.
> >
> > So I think the fix is to just move that validate_mm() into the other
> > return case of do_vmi_align_munmap(), and remove it from the caller.
> >
> > IOW, something like the attached (NOTE! This is in *addition* to the
> > previous patch, which is the same as the one you quoted, just with
> > slightly different whitespace as commit ae80b4041984: "mm: validate
> > the mm before dropping the mmap lock").
>
> Thanks a lot for guidance!
> I applied below patch directly upon ae80b4041984, and confirmed the
> WARNING gone. Thanks
>

Thanks for testing this.

I can clean more of this up now that the mmap locking has been changed.
For instance, we can drop a number of checks before a write (and all
read cases, if any remain) since there is no alteration without the
write lock.

Thanks,
Liam