Re: [PATCH 9/9] rename(): avoid a deadlock in the case of parents having no common ancestor

From: Jan Kara
Date: Thu Nov 23 2023 - 23:18:54 EST


On Wed 22-11-23 19:36:52, Al Viro wrote:
> ... and fix the directory locking documentation and proof of correctness.
> Holding ->s_vfs_rename_mutex *almost* prevents ->d_parent changes; the
> case where we really don't want it is splicing the root of disconnected
> tree to somewhere.
>
> In other words, ->s_vfs_rename_mutex is sufficient to stabilize "X is an
> ancestor of Y" only if X and Y are already in the same tree. Otherwise
> it can go from false to true, and one can construct a deadlock on that.
>
> Make lock_two_directories() report an error in such case and update the
> callers of lock_rename()/lock_rename_child() to handle such errors.
>
> And yes, such conditions are not impossible to create ;-/
>
> Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>

Looks good to me. Just one nit below but whether you decide to address it
or not, feel free to add:

Reviewed-by: Jan Kara <jack@xxxxxxx>

> +// p1 != p2, both are on the same filesystem, ->s_vfs_rename_mutex is held
> static struct dentry *lock_two_directories(struct dentry *p1, struct dentry *p2)
> {
> - struct dentry *p;
> + struct dentry *p = p1, *q = p2, *r;
>
> - p = d_ancestor(p2, p1);
> - if (p) {
> + while ((r = p->d_parent) != p2 && r != p)
> + p = r;
> + if (r == p2) {
> + // p is a child of p2 and an ancestor of p1 or p1 itself
> inode_lock_nested(p2->d_inode, I_MUTEX_PARENT);
> inode_lock_nested(p1->d_inode, I_MUTEX_PARENT2);
> return p;
> }
> -
> - p = d_ancestor(p1, p2);
> - inode_lock_nested(p1->d_inode, I_MUTEX_PARENT);
> - inode_lock_nested(p2->d_inode, I_MUTEX_PARENT2);
> - return p;
> + // p is the root of connected component that contains p1
> + // p2 does not occur on the path from p to p1
> + while ((r = q->d_parent) != p1 && r != p && r != q)
> + q = r;
> + if (r == p1) {
> + // q is a child of p1 and an ancestor of p2 or p2 itself
> + inode_lock_nested(p1->d_inode, I_MUTEX_PARENT);
> + inode_lock_nested(p2->d_inode, I_MUTEX_PARENT2);
> + return q;
> + } else if (likely(r == p)) {
> + // both p2 and p1 are descendents of p
> + inode_lock_nested(p1->d_inode, I_MUTEX_PARENT);
> + inode_lock_nested(p2->d_inode, I_MUTEX_PARENT2);
> + return NULL;
> + } else { // no common ancestor at the time we'd been called
> + mutex_unlock(&p1->d_sb->s_vfs_rename_mutex);

It would look more natural to me if s_vfs_rename_mutex got dropped in the
callers (lock_rename(), lock_rename_child()) which have acquired the lock
instead of here. I agree it results in a bit more boiler plate code though.

> + return ERR_PTR(-EXDEV);
> + }
> }

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR