Re: [GIT PULL] Detaching mounts on unlink for 3.15

From: Al Viro
Date: Sun Apr 20 2014 - 01:41:27 EST


On Thu, Apr 17, 2014 at 11:12:03PM +0100, Al Viro wrote:

> I'd probably turn mntput_no_expire() into something like
> static struct mount *__mntput(struct mount *m)
> that would return NULL if nothing needs to be killed and returned m
> if m really needs killing. Leaving the caller to decide what to do
> with that puppy. We have, as it is, exactly two callers - exit path
> in sys_umount() and mntput(). So we add two more functions:
> static void kill_mnt_async(struct mount *m)
> and
> static void kill_mnt_sync(struct mount *m)
> both being no-op on NULL. Then in sys_umount() and mntput() we do
> kill_mnt_async(__mntput(mnt));
> and in mntput_sync() - kill_mnt_sync(__mntput(mnt));
> For that matter, kill_mnt_sync() (basically, your variant with completions)
> can be folded into mntput_sync().

Actually, all kern_unmount() callers are doing that from fairly shallow
stack depth and all simple_release_fs() ones are dealing with rather
trivial ->kill_sb(). So mntput_sync() is an overkill; all we need is
if (mnt->mnt_flags & MNT_INTERNAL) {
cleanup_mnt(mnt);
return;
}
<do task_work_add or schedule_delayed_work song and dance>
right in the end of mntput_no_expire(). OK, now I have something that
looks like a complete solution. The last missing bit is to take all
filp_close() of acct->file to kernel thread, and have them done via
__fput_sync() there. Then auto-close (done from cleanup_mnt()) will
consist of shutting down all affected acct and waiting for that kernel
thread to run through everything currently in its queue. That'll take
care of waiting until acct(NULL) done by somebody else gets through closing
the file and through corresponding mntput(). And *those* mntput() also
can be synchronous - they are clones of the one we hadn't finished shutting
down yet, so both dput() and deactivate_super() will bugger off immediately.
So we just mark those instead-of-mnt_pin() clones as MNT_INTERNAL. Voila.
After that ->mnt_pinned crap dies, acct auto-close ought to be race-free
and we get the actual fs shutdown guaranteed to be on shallow stack, without
extra context switches, etc. in the normal case.

Let's see if that survives testing...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/