Re: [PATCH] vfs: dcache: cond_resched in shrink_dentry_list

From: Dave Chinner
Date: Mon Mar 25 2013 - 22:40:42 EST


On Mon, Mar 25, 2013 at 05:39:13PM -0700, Greg Thelen wrote:
> On Mon, Mar 25 2013, Dave Chinner wrote:
> > On Mon, Mar 25, 2013 at 10:22:31AM -0700, Greg Thelen wrote:
> >> Call cond_resched() from shrink_dentry_list() to preserve
> >> shrink_dcache_parent() interactivity.
> >>
> >> void shrink_dcache_parent(struct dentry * parent)
> >> {
> >> while ((found = select_parent(parent, &dispose)) != 0)
> >> shrink_dentry_list(&dispose);
> >> }
> >>
> >> select_parent() populates the dispose list with dentries which
> >> shrink_dentry_list() then deletes. select_parent() carefully uses
> >> need_resched() to avoid doing too much work at once. But neither
> >> shrink_dcache_parent() nor its called functions call cond_resched().
> >> So once need_resched() is set select_parent() will return single
> >> dentry dispose list which is then deleted by shrink_dentry_list().
> >> This is inefficient when there are a lot of dentry to process. This
> >> can cause softlockup and hurts interactivity on non preemptable
> >> kernels.
> >
> > Hi Greg,
> >
> > I can see how this coul dcause problems, but isn't the problem then
> > that shrink_dcache_parent()/select_parent() itself is mishandling
> > the need for rescheduling rather than being a problem with
> > the shrink_dentry_list() implementation? i.e. select_parent() is
> > aborting batching based on a need for rescheduling, but then not
> > doing that itself and assuming that someone else will do the
> > reschedule for it?
> >
> > Perhaps this is a better approach:
> >
> > - while ((found = select_parent(parent, &dispose)) != 0)
> > + while ((found = select_parent(parent, &dispose)) != 0) {
> > shrink_dentry_list(&dispose);
> > + cond_resched();
> > + }
> >
> > With this, select_parent() stops batching when a resched is needed,
> > we dispose of the list as a single batch and only then resched if it
> > was needed before we go and grab the next batch. That should fix the
> > "small batch" problem without the potential for changing the
> > shrink_dentry_list() behaviour adversely for other users....
>
> I considered only modifying shrink_dcache_parent() as you show above.
> Either approach fixes the problem I've seen. My initial approach adds
> cond_resched() deeper into shrink_dentry_list() because I thought that
> there might a secondary benefit: shrink_dentry_list() would be willing
> to give up the processor when working on a huge number of dentry. This
> could improve interactivity during shrinker and umount. I don't feel
> strongly on this and would be willing to test and post the
> add-cond_resched-to-shrink_dcache_parent approach.

The shrinker has interactivity problems because of the global
dcache_lru_lock, not because of ithe size of the list passed to
shrink_dentry_list(). The amount of work that shrink_dentry_list()
does here is already bound by the shrinker batch size. Hence in the
absence of the RT folk complaining about significant holdoffs I
don't think there is an interactivity problem through the shrinker
path.

As for the unmount path - shrink_dcache_for_umount_subtree() - that
doesn't use shrink_dentry_list() and so would need it's own internal
calls to cond_resched(). Perhaps it's shrink_dcache_sb() that you
are concerned about? Either way, And there are lots more similar
issues in the unmount path such as evict_inodes(), so unless you are
going to give every possible path through unmount/remount/bdev
invalidation the same treatment then changing shrink_dentry_list()
won't significantly improve the interactivity of the system
situation in these paths...

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/