Re: blocking vs. non-blocking mmu notifiers

From: Michal Hocko
Date: Wed Mar 23 2022 - 12:49:50 EST


On Wed 23-03-22 13:31:46, Jason Gunthorpe wrote:
> On Wed, Mar 23, 2022 at 10:45:30AM +0100, Michal Hocko wrote:
> > [Let me add more people to the CC list - I am not really sure who is the
> > most familiar with all the tricks that mmu notifiers might do]
> >
> > On Wed 23-03-22 09:43:59, Juergen Gross wrote:
> > > Hi,
> > >
> > > during analysis of a customer's problem on a 4.12 based kernel
> > > (deadlock due to a blocking mmu notifier in a Xen driver) I came
> > > across upstream patches 93065ac753e4 ("mm, oom: distinguish
> > > blockable mode for mmu notifiers") et al.
> > >
> > > The backtrace of the blocked tasks was typically something like:
> > >
> > > #0 [ffffc9004222f228] __schedule at ffffffff817223e2
> > > #1 [ffffc9004222f2b8] schedule at ffffffff81722a02
> > > #2 [ffffc9004222f2c8] schedule_preempt_disabled at ffffffff81722d0a
> > > #3 [ffffc9004222f2d0] __mutex_lock at ffffffff81724104
> > > #4 [ffffc9004222f360] mn_invl_range_start at ffffffffc01fd398 [xen_gntdev]
> > > #5 [ffffc9004222f398] __mmu_notifier_invalidate_page at ffffffff8123375a
> > > #6 [ffffc9004222f3c0] try_to_unmap_one at ffffffff812112cb
> > > #7 [ffffc9004222f478] rmap_walk_file at ffffffff812105cd
> > > #8 [ffffc9004222f4d0] try_to_unmap at ffffffff81212450
> > > #9 [ffffc9004222f508] shrink_page_list at ffffffff811e0755
> > > #10 [ffffc9004222f5c8] shrink_inactive_list at ffffffff811e13cf
> > > #11 [ffffc9004222f6a8] shrink_node_memcg at ffffffff811e241f
> > > #12 [ffffc9004222f790] shrink_node at ffffffff811e29c5
> > > #13 [ffffc9004222f808] do_try_to_free_pages at ffffffff811e2ee1
> > > #14 [ffffc9004222f868] try_to_free_pages at ffffffff811e3248
> > > #15 [ffffc9004222f8e8] __alloc_pages_slowpath at ffffffff81262c37
> > > #16 [ffffc9004222f9f0] __alloc_pages_nodemask at ffffffff8121afc1
> > > #17 [ffffc9004222fa48] alloc_pages_current at ffffffff8122f350
> > > #18 [ffffc9004222fa78] __get_free_pages at ffffffff8121685a
> > > #19 [ffffc9004222fa80] __pollwait at ffffffff8127e795
> > > #20 [ffffc9004222faa8] evtchn_poll at ffffffffc00e802b [xen_evtchn]
> > > #21 [ffffc9004222fab8] do_sys_poll at ffffffff8127f953
> > > #22 [ffffc9004222fec8] sys_ppoll at ffffffff81280478
> > > #23 [ffffc9004222ff30] do_syscall_64 at ffffffff81004954
> > > #24 [ffffc9004222ff50] entry_SYSCALL_64_after_hwframe at ffffffff818000b6
> > >
> > > It was found that the notifier of the Xen gntdev driver was using a
> > > mutex resulting in the deadlock.
>
> The bug here is that prior to commit a81461b0546c ("xen/gntdev: update
> to new mmu_notifier semantic") wired the mn_invl_range_start() which
> takes a mutex to invalidate_page, which is defined to run in an atomic
> context.

Yeah, we have already identified that but quickly realized that the
whole mmu notifier overhaul which this fix depends on would be no no for
backporting to our older code base. So we are trying to find our way
around that.

> > > Michal Hocko suggested that backporting above mentioned patch might
> > > help, as the mmu notifier call is happening in atomic context.
>
> IIRC "blocking" was not supposed to be about atomic context or not, but
> more about time to completion/guarenteed completion as it is used from
> a memory reclaim path.

Yeah, when I was introducing that I was mostly concerned about oom path.
But it is essentially impossible to tell what are the existing
assumptions different mmu notifiers operate on.

>
> > Just to be more explicit. The current upstream code calls
> > mmu_notifier_invalidate_range while the page table locks are held.
> > Are there any notifiers which could sleep in those?
>
> There should not be, that would be a bug that lockdep would find.

OK, this is reassuring. Thanks!
--
Michal Hocko
SUSE Labs