Re: dcache_readdir NULL inode oops

From: Al Viro
Date: Fri Nov 30 2018 - 11:09:05 EST


On Fri, Nov 30, 2018 at 09:16:49AM -0600, Eric W. Biederman wrote:
> >> > + inode_lock(parent->d_inode);
> >> > dentry->d_fsdata = NULL;
> >> > drop_nlink(dentry->d_inode);
> >> > d_delete(dentry);
> >> > + inode_unlock(parent->d_inode);
> >> > +
> >> > dput(dentry); /* d_alloc_name() in devpts_pty_new() */
> >> > }
> >
> > This feels right but getting some feedback from others would be good.
>
> This is going to be special at least because we are not coming through
> the normal unlink path and we are manipulating the dcache.
>
> This looks plausible. If this is whats going on then we have had this
> bug for a very long time. I will see if I can make some time.
>
> It looks like in the general case everything is serialized by the
> devpts_mutex. I wonder if just changing the order of operations
> here would be enough.
>
> AKA: drop_nlink d_delete then dentry->d_fsdata. Ugh d_fsdata is not
> implicated so that won't help here.

It certainly won't. The thing is, this
if (!dir_emit(ctx, next->d_name.name, next->d_name.len,
d_inode(next)->i_ino, dt_type(d_inode(next))))
in dcache_readdir() obviously can block, so all we can hold over it is
blocking locks. Which we do - specifically, ->i_rwsem on our directory.

It's actually worse than missing inode_lock() - consider the effects
of mount --bind /mnt/foo /dev/pts/42. What happens when that thing
goes away? Right, a lost mount...

I'll resurrect the "kernel-internal rm -rf done right" series and
post it; devpts is not the only place suffering such problem (binfmt_misc,
etc.)