Re: [syzbot] [mm?] possible deadlock in shmem_uncharge (2)

From: Carlos Maiolino
Date: Mon Jul 24 2023 - 08:00:26 EST


Hi Hugh.

On Sat, Jul 22, 2023 at 09:37:35PM -0700, Hugh Dickins wrote:
> On Tue, 18 Jul 2023, Carlos Maiolino wrote:
> > On Tue, Jul 18, 2023 at 11:22:59AM -0700, Hugh Dickins wrote:
> > > On Tue, 18 Jul 2023, syzbot wrote:
> > >
> > > > Hello,
> > > >
> > > > syzbot found the following issue on:
> > >
> > > Yes, this doesn't require any syzbot trickery, it showed up as soon as
> > > I tried a shmem quota linux-next with lockdep and shmem huge last week.
> > >
> > > There's some other things wrong with the accounting there (in the non-
> > > quota case anyway): I have been working up a patch to fix them, but need
> > > to consider what must go in quickly, and what should wait until later.
> > >
> > > Carlos, in brief: don't worry about this syzbot report, I'm on it (but
> > > there's a risk that any patch I send may turn out to break your quotas).
> >
> > Ok, thanks Hugh, I have the next version ready to go when I saw those syzkaller
> > reports.
> >
> > I will send the new series tomorrow morning then.
> > Could you please keep me in the loop, I'm interested to see what's going on
> > there.
>
> I never saw a new series on Wednesday, just a new series on Monday which
> I had ignored, knowing that another was coming. I was waiting for your
> new series before sending my patch to that, but perhaps you were waiting
> for my patch before sending your new series?

TL;DR, I spoke with Andrew past week, and we agreed it would be better to wait
for you to submit the patches before I send the new version, I thought I had
cc'ed in the email thread, but seems like I didn't, my apologies.

>
> Then when I went to check my patch on next-20230721, found that your
> series has been dropped from mm-unstable meanwhile. Maybe Andrew
> dropped it because of the lockdep reports (I see now there was another),
> or maybe he dropped it because of new series coming too thick and fast,
> or maybe he dropped it because of the awkward merge commit which Stephen
> Rothwell had to write, to reconcile the vfs and mm trees. (But I was
> surprised not to have seen any mm-commits mail on the dropping.)
>
> So the patch below is against next-20230714, and will hopefully still
> apply in your next posting: please include it there (since it modifies
> THP source, I think it's best kept as a separate patch in your series).
>
> But when you repost, if you are still going for the next release, the
> more you can do to minimize that merge commit the better. In particular,
> the reindentations of shmem_get_inode() and other functions in
> "shmem: make shmem_get_inode() return ERR_PTR instead of NULL"
> don't help review, and could well be left until some later cleanup -
> but I don't know how much that would actually eliminate the clashes.

I'm not @work today, so I can't take a deep look into it by now, but,
I plan to rebase the series on top of linux-next and I'll include your patch in
my series, if that's what I understood from a quick read on your email.

I'll try to work on it tomorrow first thing, thanks!

Carlos


>
> (And "Add default quota limit mount options" needs to say "shmem: ...";
> I'd have preferred "tmpfs: ...", but the others say shmem, so okay.)
>
> [PATCH] shmem: fix quota lock nesting in huge hole handling
>
> i_pages lock nests inside i_lock, but shmem_charge() and shmem_uncharge()
> were being called from THP splitting or collapsing while i_pages lock was
> held, and now go on to call dquot_alloc_block_nodirty() which takes
> i_lock to update i_blocks.
>
> We may well want to take i_lock out of this path later, in the non-quota
> case even if it's left in the quota case (or perhaps use i_lock instead
> of shmem's info->lock throughout); but don't get into that at this time.
>
> Move the shmem_charge() and shmem_uncharge() calls out from under i_pages
> lock, accounting the full batch of holes in a single call.
>
> Still pass the pages argument to shmem_uncharge(), but it happens now to
> be unused: shmem_recalc_inode() is designed to account for clean pages
> freed behind shmem's back, so it gets the accounting right by itself;
> then the later call to shmem_inode_unacct_blocks() led to imbalance
> (that WARN_ON(inode->i_blocks) in shmem_evict_inode()).
>
> Reported-by: syzbot+38ca19393fb3344f57e6@xxxxxxxxxxxxxxxxxxxxxxxxx
> Closes: https://lore.kernel.org/lkml/0000000000008e62f40600bfe080@xxxxxxxxxx/
> Reported-by: syzbot+440ff8cca06ee7a1d4db@xxxxxxxxxxxxxxxxxxxxxxxxx
> Closes: https://lore.kernel.org/lkml/00000000000076a7840600bfb6e8@xxxxxxxxxx/
> Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>
> ---
> mm/huge_memory.c | 6 ++++--
> mm/khugepaged.c | 13 +++++++------
> mm/shmem.c | 19 +++++++++----------
> 3 files changed, 20 insertions(+), 18 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 762be2f4244c..01f6838329a1 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2521,7 +2521,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> struct address_space *swap_cache = NULL;
> unsigned long offset = 0;
> unsigned int nr = thp_nr_pages(head);
> - int i;
> + int i, nr_dropped = 0;
>
> /* complete memcg works before add pages to LRU */
> split_page_memcg(head, nr);
> @@ -2546,7 +2546,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> struct folio *tail = page_folio(head + i);
>
> if (shmem_mapping(head->mapping))
> - shmem_uncharge(head->mapping->host, 1);
> + nr_dropped++;
> else if (folio_test_clear_dirty(tail))
> folio_account_cleaned(tail,
> inode_to_wb(folio->mapping->host));
> @@ -2583,6 +2583,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> }
> local_irq_enable();
>
> + if (nr_dropped)
> + shmem_uncharge(head->mapping->host, nr_dropped);
> remap_page(folio, nr);
>
> if (PageSwapCache(head)) {
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 6bad69c0e4bd..366ee467fb83 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1828,10 +1828,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
> goto xa_locked;
> }
> }
> - if (!shmem_charge(mapping->host, 1)) {
> - result = SCAN_FAIL;
> - goto xa_locked;
> - }
> nr_none++;
> continue;
> }
> @@ -2017,8 +2013,13 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
> */
> try_to_unmap_flush();
>
> - if (result != SCAN_SUCCEED)
> + if (result == SCAN_SUCCEED && nr_none &&
> + !shmem_charge(mapping->host, nr_none))
> + result = SCAN_FAIL;
> + if (result != SCAN_SUCCEED) {
> + nr_none = 0;
> goto rollback;
> + }
>
> /*
> * The old pages are locked, so they won't change anymore.
> @@ -2157,8 +2158,8 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
> if (nr_none) {
> xas_lock_irq(&xas);
> mapping->nrpages -= nr_none;
> - shmem_uncharge(mapping->host, nr_none);
> xas_unlock_irq(&xas);
> + shmem_uncharge(mapping->host, nr_none);
> }
>
> list_for_each_entry_safe(page, tmp, &pagelist, lru) {
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 161592a8d3fe..521a6302dc37 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -424,18 +424,20 @@ static void shmem_recalc_inode(struct inode *inode)
> bool shmem_charge(struct inode *inode, long pages)
> {
> struct shmem_inode_info *info = SHMEM_I(inode);
> - unsigned long flags;
> + struct address_space *mapping = inode->i_mapping;
>
> if (shmem_inode_acct_block(inode, pages))
> return false;
>
> /* nrpages adjustment first, then shmem_recalc_inode() when balanced */
> - inode->i_mapping->nrpages += pages;
> + xa_lock_irq(&mapping->i_pages);
> + mapping->nrpages += pages;
> + xa_unlock_irq(&mapping->i_pages);
>
> - spin_lock_irqsave(&info->lock, flags);
> + spin_lock_irq(&info->lock);
> info->alloced += pages;
> shmem_recalc_inode(inode);
> - spin_unlock_irqrestore(&info->lock, flags);
> + spin_unlock_irq(&info->lock);
>
> return true;
> }
> @@ -443,16 +445,13 @@ bool shmem_charge(struct inode *inode, long pages)
> void shmem_uncharge(struct inode *inode, long pages)
> {
> struct shmem_inode_info *info = SHMEM_I(inode);
> - unsigned long flags;
>
> /* nrpages adjustment done by __filemap_remove_folio() or caller */
>
> - spin_lock_irqsave(&info->lock, flags);
> - info->alloced -= pages;
> + spin_lock_irq(&info->lock);
> shmem_recalc_inode(inode);
> - spin_unlock_irqrestore(&info->lock, flags);
> -
> - shmem_inode_unacct_blocks(inode, pages);
> + /* which has called shmem_inode_unacct_blocks() if necessary */
> + spin_unlock_irq(&info->lock);
> }
>
> /*
> --
> 2.35.3
>