Re: [GIT PULL] tracing/eventfs: Fixes for v6.7-rc2

From: Steven Rostedt
Date: Sun Nov 26 2023 - 10:11:23 EST


On Sun, 26 Nov 2023 10:04:57 -0500
Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:

> Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>

Ignore Mathieu's email address above. It was just a glitch in my scripts ;-)

-- Steve

>
> Linus,
>
> eventfs fixes:
>
> - With the usage of simple_recursive_remove() recommended by Al Viro,
> the code should not be calling "d_invalidate()" itself. Doing so
> is causing crashes. The code was calling d_invalidate() on the race
> of trying to look up a file while the parent was being deleted.
> This was detected, and the added dentry was having d_invalidate() called
> on it, but the deletion of the directory was also calling d_invalidate()
> on that same dentry.
>
> - A fix to not free the eventfs_inode (ei) until the last dput() was called
> on its ei->dentry made the ei->dentry exist even after it was marked
> for free by setting the ei->is_freed. But code elsewhere still was
> checking if ei->dentry was NULL if ei->is_freed is set and would
> trigger WARN_ON if that was the case. That's no longer true and there
> should not be any warnings when it is true.
>
> - Use GFP_NOFS for allocations done under eventfs_mutex.
> The eventfs_mutex can be taken on file system reclaim, make sure
> that allocations done under that mutex do not trigger file system
> reclaim.
>
> - Clean up code by moving the taking of inode_lock out of the helper
> functions and into where they are needed, and not use the
> parameter to know to take it or not. It must always be held but
> some callers of the helper function have it taken when they were
> called.
>
> - Warn if the inode_lock is not held in the helper functions.
>
> - Warn if eventfs_start_creating() is called without a parent.
> As eventfs is underneath tracefs, all files created will have
> a parent (the top one will have a tracefs parent).
>
> Tracing update;
>
> - Add Mathieu Desnoyers as an official reviewer of the tracing sub system.
>
>
> Please pull the latest trace-v6.7-rc2 tree, which can be found at:
>
>
> git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
> trace-v6.7-rc2
>
> Tag SHA1: 94eabb90d0195e49d78e7714be7b239284ee1e96
> Head SHA1: 76d9eafff4484547ed9e606c8227ac9799a9f2da
>
>
> Mathieu Desnoyers (1):
> MAINTAINERS: TRACING: Add Mathieu Desnoyers as Reviewer
>
> Steven Rostedt (Google) (6):
> eventfs: Remove expectation that ei->is_freed means ei->dentry == NULL
> eventfs: Do not invalidate dentry in create_file/dir_dentry()
> eventfs: Use GFP_NOFS for allocation when eventfs_mutex is held
> eventfs: Move taking of inode_lock into dcache_dir_open_wrapper()
> eventfs: Do not allow NULL parent to eventfs_start_creating()
> eventfs: Make sure that parent->d_inode is locked in creating files/dirs
>
> ----
> MAINTAINERS | 1 +
> fs/tracefs/event_inode.c | 65 +++++++++++++++++++-----------------------------
> fs/tracefs/inode.c | 13 +++-------
> 3 files changed, 31 insertions(+), 48 deletions(-)
> ---------------------------
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ea790149af79..a2d4ef4d90f6 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -22078,6 +22078,7 @@ F: drivers/watchdog/tqmx86_wdt.c
> TRACING
> M: Steven Rostedt <rostedt@xxxxxxxxxxx>
> M: Masami Hiramatsu <mhiramat@xxxxxxxxxx>
> +R: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
> L: linux-kernel@xxxxxxxxxxxxxxx
> L: linux-trace-kernel@xxxxxxxxxxxxxxx
> S: Maintained
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index f8a594a50ae6..0b90869fd805 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -27,16 +27,16 @@
> /*
> * eventfs_mutex protects the eventfs_inode (ei) dentry. Any access
> * to the ei->dentry must be done under this mutex and after checking
> - * if ei->is_freed is not set. The ei->dentry is released under the
> - * mutex at the same time ei->is_freed is set. If ei->is_freed is set
> - * then the ei->dentry is invalid.
> + * if ei->is_freed is not set. When ei->is_freed is set, the dentry
> + * is on its way to being freed after the last dput() is made on it.
> */
> static DEFINE_MUTEX(eventfs_mutex);
>
> /*
> * The eventfs_inode (ei) itself is protected by SRCU. It is released from
> * its parent's list and will have is_freed set (under eventfs_mutex).
> - * After the SRCU grace period is over, the ei may be freed.
> + * After the SRCU grace period is over and the last dput() is called
> + * the ei is freed.
> */
> DEFINE_STATIC_SRCU(eventfs_srcu);
>
> @@ -95,7 +95,7 @@ static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry,
> if (!(dentry->d_inode->i_mode & S_IFDIR)) {
> if (!ei->entry_attrs) {
> ei->entry_attrs = kzalloc(sizeof(*ei->entry_attrs) * ei->nr_entries,
> - GFP_KERNEL);
> + GFP_NOFS);
> if (!ei->entry_attrs) {
> ret = -ENOMEM;
> goto out;
> @@ -326,7 +326,8 @@ create_file_dentry(struct eventfs_inode *ei, int idx,
> struct eventfs_attr *attr = NULL;
> struct dentry **e_dentry = &ei->d_children[idx];
> struct dentry *dentry;
> - bool invalidate = false;
> +
> + WARN_ON_ONCE(!inode_is_locked(parent->d_inode));
>
> mutex_lock(&eventfs_mutex);
> if (ei->is_freed) {
> @@ -348,15 +349,8 @@ create_file_dentry(struct eventfs_inode *ei, int idx,
>
> mutex_unlock(&eventfs_mutex);
>
> - /* The lookup already has the parent->d_inode locked */
> - if (!lookup)
> - inode_lock(parent->d_inode);
> -
> dentry = create_file(name, mode, attr, parent, data, fops);
>
> - if (!lookup)
> - inode_unlock(parent->d_inode);
> -
> mutex_lock(&eventfs_mutex);
>
> if (IS_ERR_OR_NULL(dentry)) {
> @@ -365,12 +359,14 @@ create_file_dentry(struct eventfs_inode *ei, int idx,
> * created the dentry for this e_dentry. In which case
> * use that one.
> *
> - * Note, with the mutex held, the e_dentry cannot have content
> - * and the ei->is_freed be true at the same time.
> + * If ei->is_freed is set, the e_dentry is currently on its
> + * way to being freed, don't return it. If e_dentry is NULL
> + * it means it was already freed.
> */
> - dentry = *e_dentry;
> - if (WARN_ON_ONCE(dentry && ei->is_freed))
> + if (ei->is_freed)
> dentry = NULL;
> + else
> + dentry = *e_dentry;
> /* The lookup does not need to up the dentry refcount */
> if (dentry && !lookup)
> dget(dentry);
> @@ -387,17 +383,14 @@ create_file_dentry(struct eventfs_inode *ei, int idx,
> * Otherwise it means two dentries exist with the same name.
> */
> WARN_ON_ONCE(!ei->is_freed);
> - invalidate = true;
> + dentry = NULL;
> }
> mutex_unlock(&eventfs_mutex);
>
> - if (invalidate)
> - d_invalidate(dentry);
> -
> - if (lookup || invalidate)
> + if (lookup)
> dput(dentry);
>
> - return invalidate ? NULL : dentry;
> + return dentry;
> }
>
> /**
> @@ -437,9 +430,10 @@ static struct dentry *
> create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei,
> struct dentry *parent, bool lookup)
> {
> - bool invalidate = false;
> struct dentry *dentry = NULL;
>
> + WARN_ON_ONCE(!inode_is_locked(parent->d_inode));
> +
> mutex_lock(&eventfs_mutex);
> if (pei->is_freed || ei->is_freed) {
> mutex_unlock(&eventfs_mutex);
> @@ -456,15 +450,8 @@ create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei,
> }
> mutex_unlock(&eventfs_mutex);
>
> - /* The lookup already has the parent->d_inode locked */
> - if (!lookup)
> - inode_lock(parent->d_inode);
> -
> dentry = create_dir(ei, parent);
>
> - if (!lookup)
> - inode_unlock(parent->d_inode);
> -
> mutex_lock(&eventfs_mutex);
>
> if (IS_ERR_OR_NULL(dentry) && !ei->is_freed) {
> @@ -473,8 +460,8 @@ create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei,
> * created the dentry for this e_dentry. In which case
> * use that one.
> *
> - * Note, with the mutex held, the e_dentry cannot have content
> - * and the ei->is_freed be true at the same time.
> + * If ei->is_freed is set, the e_dentry is currently on its
> + * way to being freed.
> */
> dentry = ei->dentry;
> if (dentry && !lookup)
> @@ -493,16 +480,14 @@ create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei,
> * Otherwise it means two dentries exist with the same name.
> */
> WARN_ON_ONCE(!ei->is_freed);
> - invalidate = true;
> + dentry = NULL;
> }
> mutex_unlock(&eventfs_mutex);
> - if (invalidate)
> - d_invalidate(dentry);
>
> - if (lookup || invalidate)
> + if (lookup)
> dput(dentry);
>
> - return invalidate ? NULL : dentry;
> + return dentry;
> }
>
> /**
> @@ -632,7 +617,7 @@ static int add_dentries(struct dentry ***dentries, struct dentry *d, int cnt)
> {
> struct dentry **tmp;
>
> - tmp = krealloc(*dentries, sizeof(d) * (cnt + 2), GFP_KERNEL);
> + tmp = krealloc(*dentries, sizeof(d) * (cnt + 2), GFP_NOFS);
> if (!tmp)
> return -1;
> tmp[cnt] = d;
> @@ -698,6 +683,7 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file)
> return -ENOMEM;
> }
>
> + inode_lock(parent->d_inode);
> list_for_each_entry_srcu(ei_child, &ei->children, list,
> srcu_read_lock_held(&eventfs_srcu)) {
> d = create_dir_dentry(ei, ei_child, parent, false);
> @@ -730,6 +716,7 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file)
> cnt++;
> }
> }
> + inode_unlock(parent->d_inode);
> srcu_read_unlock(&eventfs_srcu, idx);
> ret = dcache_dir_open(inode, file);
>
> diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
> index 5b54948514fe..ae648deed019 100644
> --- a/fs/tracefs/inode.c
> +++ b/fs/tracefs/inode.c
> @@ -509,20 +509,15 @@ struct dentry *eventfs_start_creating(const char *name, struct dentry *parent)
> struct dentry *dentry;
> int error;
>
> + /* Must always have a parent. */
> + if (WARN_ON_ONCE(!parent))
> + return ERR_PTR(-EINVAL);
> +
> error = simple_pin_fs(&trace_fs_type, &tracefs_mount,
> &tracefs_mount_count);
> if (error)
> return ERR_PTR(error);
>
> - /*
> - * If the parent is not specified, we create it in the root.
> - * We need the root dentry to do this, which is in the super
> - * block. A pointer to that is in the struct vfsmount that we
> - * have around.
> - */
> - if (!parent)
> - parent = tracefs_mount->mnt_root;
> -
> if (unlikely(IS_DEADDIR(parent->d_inode)))
> dentry = ERR_PTR(-ENOENT);
> else