Re: [RFC][PATCH] fs: optimize inotify/fsnotify code for unwatched files

From: Jan Kara
Date: Tue Jun 23 2015 - 11:17:49 EST


On Fri 19-06-15 14:50:25, Dave Hansen wrote:
>
> From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
>
> I have a _tiny_ microbenchmark that sits in a loop and writes
> single bytes to a file. Writing one byte to a tmpfs file is
> around 2x slower than reading one byte from a file, which is a
> _bit_ more than I expecte. This is a dumb benchmark, but I think
> it's hard to deny that write() is a hot path and we should avoid
> unnecessary overhead there.
>
> I did a 'perf record' of 30-second samples of read and write.
> The top item in a diffprofile is srcu_read_lock() from
> fsnotify(). There are active inotify fd's from systemd, but
> nothing is actually listening to the file or its part of
> the filesystem.
>
> I *think* we can avoid taking the srcu_read_lock() for the
> common case where there are no actual marks on the file
> being modified *or* the vfsmount.
>
> The *_fsnotify_mask is an aggregate of each of the masks from
> each mark. If we have nothing set in the masks at all then there
> are no marks and no need to do anything with 'ignored masks'
> since none exist. This keeps us from having to do the costly
> srcu_read_lock() for a check which is very cheap.
>
> This patch gave a 10.8% speedup in writes/second on my test.
...
> diff -puN fs/notify/fsnotify.c~optimize-fsnotify fs/notify/fsnotify.c
> --- a/fs/notify/fsnotify.c~optimize-fsnotify 2015-06-19 13:29:53.117283581 -0700
> +++ b/fs/notify/fsnotify.c 2015-06-19 13:29:53.123283853 -0700
> @@ -213,6 +213,16 @@ int fsnotify(struct inode *to_tell, __u3
> !(test_mask & to_tell->i_fsnotify_mask) &&
> !(mnt && test_mask & mnt->mnt_fsnotify_mask))
> return 0;
> + /*
> + * Optimization: The *_fsnotify_mask is an aggregate of each of the
> + * masks from each mark. If we have nothing set in the masks at
> + * all then there are no marks and no need to do anything with
> + * 'ignored masks' since none exist. This keeps us from having to
> + * do the costly srcu_read_lock() for a check which is very cheap.
> + */
> + if (!to_tell->i_fsnotify_mask &&
> + (!mnt || !mnt->mnt_fsnotify_mask))
> + return 0;

But this changes userspace visible behavior. You can have ignored mask set
without any of the notification masks set and you are expected to clear the
ignored mask on the first IN_MODIFY event. So the test just above your
check is dealing with all the cases we can easily detect.

That being said we could further refine things by storing a flag in inode &
struct mount telling whether any of the attached marks have ignored_mask
that needs clearing set and only do the traversal of the list of marks if any
of the ignored_masks is set. That will not only avoid the traversal when
nobody is watching but also in cases where nobody is watching for the
IN_MODIFY event.

Finally, we could do something even without the flag. We can have a look at
to_tell->i_fsnotify_marks.first and mnt->mnt_fsnotify_marks.first even
without SRCU to check whether they are both NULL. If they are, we know we
can safely skip the SRCU thing. If they are != NULL, we grab SRCU read lock
and refetch the pointers. This should be correct as we don't dereference
any pointer we fetched outside of the SRCU critical section. Am I right?
And this optimizes the case when nobody is watching without unwanted
side-effects.

Honza
>
> idx = srcu_read_lock(&fsnotify_mark_srcu);
>
> _
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/