Re: dm ioctl: Restore __GFP_HIGH in copy_params()

From: Mike Snitzer
Date: Mon May 22 2017 - 14:04:26 EST


On Mon, May 22 2017 at 11:03am -0400,
Michal Hocko <mhocko@xxxxxxxxxx> wrote:

> On Mon 22-05-17 10:52:44, Mikulas Patocka wrote:
> >
> >
> > On Mon, 22 May 2017, Michal Hocko wrote:
> [...]
> > > I am not sure I understand. OOM killer is invoked for _all_ allocations
> > > <= PAGE_ALLOC_COSTLY_ORDER that do not have __GFP_NORETRY as long as the
> > > OOM killer is not disabled (oom_killer_disable) and that only happens
> > > from the PM suspend path which makes sure that no userspace is active at
> > > the time. AFAIU this is a userspace triggered path and so the later
> > > shouldn't apply to it and GFP_KERNEL should be therefore sufficient.
> > > Relying to a portion of memory reserves to prevent from deadlock seems
> > > fundamentaly broken to me.
> > >
> >
> > The lvm2 was designed this way - it is broken, but there is not much that
> > can be done about it - fixing this would mean major rewrite. The only
> > thing we can do about it is to lower the deadlock probability with
> > __GFP_HIGH (or PF_MEMALLOC that was used some times ago).

Yes, lvm2 was originally designed to to have access to memory reserves
to ensure forward progress. But if the mm subsystem has improved to
allow for the required progress without lvm2 trying to stake a claim on
those reserves then we'll gladly avoid (ab)using them.

> But let me repeat. GFP_KERNEL allocation for order-0 page will not fail.

OK, but will it be serviced immediately? Not failing isn't useful if it
never completes.

> If you need non-failing semantic then just make it clear by adding
> __GFP_NOFAIL rather than __GFP_HIGH. Memory reserves are a scarce
> resource and there are users which might really need it from atomic
> contexts.

While adding the __GFP_NOFAIL flag would serve to document expectations
I'm left unconvinced that the memory allocator will _not fail_ for an
order-0 page -- as Mikulas said most ioctls don't need more than 4K.
(Apologies if you've already covered _why_ we can have confidence in the
mm subsystem's ability to ensure forward progress for these allocations).

Mike