Re: [PATCH 2/2] mm, debug: report when GFP_NO{FS,IO} is used explicitly from memalloc_no{fs,io}_{save,restore} context

From: Michal Hocko
Date: Fri Apr 29 2016 - 08:12:29 EST


On Fri 29-04-16 07:51:45, Dave Chinner wrote:
> On Thu, Apr 28, 2016 at 10:17:59AM +0200, Michal Hocko wrote:
> > [Trim the CC list]
> > On Wed 27-04-16 08:58:45, Dave Chinner wrote:
> > [...]
> > > Often these are to silence lockdep warnings (e.g. commit b17cb36
> > > ("xfs: fix missing KM_NOFS tags to keep lockdep happy")) because
> > > lockdep gets very unhappy about the same functions being called with
> > > different reclaim contexts. e.g. directory block mapping might
> > > occur from readdir (no transaction context) or within transactions
> > > (create/unlink). hence paths like this are tagged with GFP_NOFS to
> > > stop lockdep emitting false positive warnings....
> >
> > As already said in other email, I have tried to revert the above
> > commit and tried to run it with some fs workloads but didn't manage
> > to hit any lockdep splats (after I fixed my bug in the patch 1.2). I
> > have tried to find reports which led to this commit but didn't succeed
> > much. Everything is from much earlier or later. Do you happen to
> > remember which loads triggered them, what they looked like or have an
> > idea what to try to reproduce them? So far I was trying heavy parallel
> > fs_mark, kernbench inside a tiny virtual machine so any of those have
> > triggered direct reclaim all the time.
>
> Most of those issues were reported by users and not reproducable by
> any obvious means.

I would really appreciate a reference to some of those (my google-fu has
failed me) or at least a pattern of those splats - was it
"inconsistent {RECLAIM_FS-ON-[RW]} -> {IN-RECLAIM_FS-[WR]} usage"
or a different class reports?

> They may have been fixed since, but I'm sceptical
> of that because, generally speaking, developer testing only catches
> the obvious lockdep issues. i.e. it's users that report all the
> really twisty issues, and they are generally not reproducable except
> under their production workloads...
>
> IOWs, the absence of reports in your testing does not mean there
> isn't a problem, and that is one of the biggest problems with
> lockdep annotations - we have no way of ever knowing if they are
> still necessary or not without exposing users to regressions and
> potential deadlocks.....

I understand your points here but if we are sure that those lockdep
reports are just false positives then we should rather provide an api to
silence lockdep for those paths than abusing GFP_NOFS which a) hurts
the overal reclaim healthiness and b) works around a non-existing
problem with lockdep disabled which is the vast majority of
configurations.

Thanks!
--
Michal Hocko
SUSE Labs