Re: [BUG] trigger BUG_ON in mas_store_prealloc when low memory

From: Liam R. Howlett
Date: Tue Aug 08 2023 - 16:34:50 EST


* John Hsu (許永翰) <John.Hsu@xxxxxxxxxxxx> [230807 05:55]:
> On Wed, 2023-07-19 at 14:51 -0400, Liam R. Howlett wrote:

...

> > > As I know, following is rb_tree flow in 5.15.186:
> > >
> > > ...
> > > mmap_write_lock_killable(mm)
> > > ...
> > > do_mmap()
> > > ...
> > > mmap_region()
> > > ...
> > > vm_area_alloc(mm)
> > > ...
> > > mmap_write_unlock(mm)
> > >
> > > vm_area_alloc is in the mmap_lock hoding period.
> > > It seems that the flow would sleep here in rb_tree flow.
> > > If I miss anything, please tell me, thanks!
> >
> > Before the mmap_write_unlock(mm) in the above sequence, the
> > i_mmap_lock_write(), anon_vma_lock_write(), and/or the
> > flush_dcache_mmap_lock() may be taken. Check __vma_adjust().
> >
> > The insertion into the tree needs to hold some subset of these locks.
> > The rb-tree insert did not allocate within these locks, but the maple
> > tree would need to allocate within these locks to insert into the
> > tree.
> > This is why the preallocation exists and why it is necessary.
> >
>
> Yap, preallocation is necessary. anon_vma_lock_write() and
> flush_dcache_mmap_lock() hold the lock and manipulate rb_tree. I think
> that there is no maple tree manipulations during the lock holding
> period. Is there any future work in this section?

__vma_adjust() does modify the maple tree during the lock holding
section through vma_mas_store() in 6.1. Prior to 6.1, there is no maple
tree.

...

> > There are also config options to debug the tree operations, but they
> > do
> > not detect the redundant write issues. Perhaps I can look at adding
> > support for detecting redundant writes, but that will not be
> > backported
> > to a stable kernel.
> >
>
> The sufficient test cases of maple tree ensure the function work well.
> But the redundant operations (alloc node, free node, tree
> manipulations) of maple_tree are not easy to detect (e.g. the case
> reported this time and mas_preallocate() allocates redundant nodes with
> the worst case).
>
> The detecting redundant writes mechanism may help the developers to
> find out the problems easier. Hope it can be establised successfully!!

When I went to add this, I had found I already added it here [1].

This operation was not caught by MA_STATE_PREALLOC because there are two
writes before a mas_destroy(), so there may be nodes left which avoid
the warning. I'll look at improving this situation.

Thanks,
Liam


[1] https://lore.kernel.org/linux-mm/20220722160546.1478722-2-Liam.Howlett@xxxxxxxxxx/