Re: [RFC] freeing unlinked file indefinitely delayed

From: Ben Myers
Date: Mon Jul 13 2015 - 20:54:37 EST


Hey Al,

On Mon, Jul 13, 2015 at 08:56:46PM +0100, Al Viro wrote:
> On Mon, Jul 13, 2015 at 01:17:51PM -0500, Ben Myers wrote:
> > > For one thing, this patch does *not* check for i_nlink at all.
> >
> > I agree that no checking of i_nlink has the advantage of brevity.
> > Anyone who is using dentry.d_fsdata with an open_by_handle workload (if
> > there are any) will be affected.
>
> Translate, please. What does d_fsdata have to anything above?

If my understanding of your patch is correct, any disconnected dentries will be
killed immediately at dput instead of being put on the dentry lru. I have some
code that I think may rely on the old behavior with regard to
DCACHE_DISCONNECTED. That is, the disconnected dentry sits on the lru and there
are lots of open_by_handle calls and nothing to keep a ref on the inode between
open_by_handle/close except that dentry's ref. Maybe there are better ways to
do that and I just need to look for a workaround.

I wonder if there are others who will be affected by this change in behavior.
In particular, if there are filesystems that need to keep internal state in
d_fsdata across open_by_handle calls. They won't be able to use d_fsdata
very well anymore for this workload.

> > > For another, there's no such thing as 'filesystems internal lock' for
> > > i_nlink protection - that's handled by i_mutex... And what does
> > > iget() have to do with any of that?
> >
> > i_mutex is good enough only for local filesystems.
> > Network/clustered/distributed filesystems need to take an internal lock
> > to provide exclusion for this .unlink with a .link on another host.
> > That's where I'm coming from with iget().
> >
> > Maybe plumbing i_op.unlink with another argument to return i_nlink is
> > something to consider? A helper for the few filesystems that need to do
> > this might be good enough in the near term.
>
> ????
>
> a) iget() had been gone since way back
> b) it never had been called by VFS - it's a filesystem's responsibility
> c) again, what the hell does iget() or its replacements have to do with
> dentry eviction? It does *NOT* affect dentry refcount. Never had.

I misspoke with regard to iget. It is the locking for i_nlink that I am
concerned about.

> d) checks for _inode_ retention in icache are done by filesystem code, which
> is certainly free to use its locks. Incidentally, for normal filesystems
> no locks are needed at all - everything that changes ->i_nlink is holding
> a referfence to in-core inode, so in a situation when its refcount is zero
> and ->i_lock is held (providing an exclusion with icache lookups), ->i_nlink
> is guaranteed to be stable.

Thanks for that explanation, that's what I needed. For filesystems which have a
distributed lock manager involved on multiple hosts, i_nlink is not stable on an
individual host even with i_lock held. You would also need to hold whatever
finer-grained lock the filesystem is protecting i_nlink with. So the VFS
_can't_ know when i_nlink has gone to zero, only the filesystem can. That's
what I was trying to get at.

> e) why would VFS possibly want to know if there are links remaining after
> successful ->unlink()?

The VFS wouldn't. I guess the argument goes that it would be nice to keep the
old behavior and still cache disconnected dentries, and leave it to the
filesystems which require it to destroy the anonymous dentries in ->unlink when
they they know that i_nlink has gone to zero, having taken whatever internal
locks they need to use to adequately protect i_nlink from other hosts. Thats
where a helper to prune disconnected dentries would come in handy.

> I'm sorry, but you are not making any sense...

I'm sorry for the confusion, it took me a little awhile to digest this. The
change in caching behavior could affect more than nfs. Any filesystem which
expects disconnected dentries to remain cached is affected.

Thanks,
Ben
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/