Re: [PATCH] mm/hmm: replace hmm_update with mmu_notifier_range

From: Michal Hocko
Date: Wed Jul 24 2019 - 14:59:16 EST


On Wed 24-07-19 20:56:17, Michal Hocko wrote:
> On Wed 24-07-19 15:08:37, Jason Gunthorpe wrote:
> > On Wed, Jul 24, 2019 at 07:58:58PM +0200, Michal Hocko wrote:
> [...]
> > > Maybe new users have started relying on a new semantic in the meantime,
> > > back then, none of the notifier has even started any action in blocking
> > > mode on a EAGAIN bailout. Most of them simply did trylock early in the
> > > process and bailed out so there was nothing to do for the range_end
> > > callback.
> >
> > Single notifiers are not the problem. I tried to make this clear in
> > the commit message, but lets be more explicit.
> >
> > We have *two* notifiers registered to the mm, A and B:
> >
> > A invalidate_range_start: (has no blocking)
> > spin_lock()
> > counter++
> > spin_unlock()
> >
> > A invalidate_range_end:
> > spin_lock()
> > counter--
> > spin_unlock()
> >
> > And this one:
> >
> > B invalidate_range_start: (has blocking)
> > if (!try_mutex_lock())
> > return -EAGAIN;
> > counter++
> > mutex_unlock()
> >
> > B invalidate_range_end:
> > spin_lock()
> > counter--
> > spin_unlock()
> >
> > So now the oom path does:
> >
> > invalidate_range_start_non_blocking:
> > for each mn:
> > a->invalidate_range_start
> > b->invalidate_range_start
> > rc = EAGAIN
> >
> > Now we SKIP A's invalidate_range_end even though A had no idea this
> > would happen has state that needs to be unwound. A is broken.
> >
> > B survived just fine.
> >
> > A and B *alone* work fine, combined they fail.
>
> But that requires that they share some state, right?
>
> > When the commit was landed you can use KVM as an example of A and RDMA
> > ODP as an example of B
>
> Could you point me where those two share the state please? KVM seems to
> be using kvm->mmu_notifier_count but I do not know where to look for the
> RDMA...

Scratch that. ELONGDAY... I can see your point. It is all or nothing
that doesn't really work here. Looking back at your patch it seems
reasonable but I am not sure what is supposed to be a behavior for
notifiers that failed.
--
Michal Hocko
SUSE Labs