Re: [patch] mm, page_alloc: make __GFP_NOFAIL really not fail

From: Dave Chinner
Date: Wed Dec 11 2013 - 20:08:04 EST


On Tue, Dec 10, 2013 at 03:39:09PM -0800, Andrew Morton wrote:
> On Tue, 10 Dec 2013 15:20:17 -0800 (PST) David Rientjes <rientjes@xxxxxxxxxx> wrote:
>
> > On Mon, 9 Dec 2013, Andrew Morton wrote:
> >
> > > > __GFP_NOFAIL specifies that the page allocator cannot fail to return
> > > > memory. Allocators that call it may not even check for NULL upon
> > > > returning.
> > > >
> > > > It turns out GFP_NOWAIT | __GFP_NOFAIL or GFP_ATOMIC | __GFP_NOFAIL can
> > > > actually return NULL. More interestingly, processes that are doing
> > > > direct reclaim and have PF_MEMALLOC set may also return NULL for any
> > > > __GFP_NOFAIL allocation.
> > >
> > > __GFP_NOFAIL is a nasty thing and making it pretend to work even better
> > > is heading in the wrong direction, surely? It would be saner to just
> > > disallow these even-sillier combinations. Can we fix up the current
> > > callers then stick a WARN_ON() in there?
> > >
> >
> > Heh, it's difficult to remove __GFP_NOFAIL when new users get added:
> > 84235de394d9 ("fs: buffer: move allocation failure loop into the
> > allocator") added a new user
>
> That wasn't reeeeealy a new user - it was "convert an existing
> open-coded retry-for-ever loop". Which is what __GFP_NOFAIL is for.
>
> I don't think I've ever seen anyone actually fix one of these things
> (by teaching the caller to handle ENOMEM), so it obviously isn't
> working...

Right, because most of the loops are deep within filesystem
transaction code where the only thing to do with a memory allocation
failure is to abort the transaction, shutdown the filesystem and
deny user access (i.e. DOS the system) because the filesystem is
inconsistent in memory and the only way it can be recovered is
toosing everything in memory away and recovering the last valid
on disk state from the journal. i.e. umount, mount.

IOWs, the "fix" is far worse than current behaviour and so there is
absolutely no motivation for the people who own these __GFP_NOFAIL
allocations to fix them. Indeed, when you consider that the amount of
work to fix the filesystems to robustly handle ENOMEM is a *massive*
undertaking that adds significant overhead and complexity to each
filesystem, the cost/benefit analysis comes down so far on the side
of "just use __GFP_NOFAIL" that doing anything else is sheer lunacy.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/