Re: [PATCH v4 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success

From: Jan Kara
Date: Wed Jun 28 2023 - 06:17:45 EST


On Wed 28-06-23 09:33:43, Amir Goldstein wrote:
> On Tue, Jun 27, 2023 at 11:50 PM Ahelenia Ziemiańska
> <nabijaczleweli@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > The current behaviour caused an asymmetry where some write APIs
> > (write, sendfile) would notify the written-to/read-from objects,
> > but splice wouldn't.
> >
> > This affected userspace which uses inotify, most notably coreutils
> > tail -f, to monitor pipes.
> > If the pipe buffer had been filled by a splice-family function:
> > * tail wouldn't know and thus wouldn't service the pipe, and
> > * all writes to the pipe would block because it's full,
> > thus service was denied.
> > (For the particular case of tail -f this could be worked around
> > with ---disable-inotify.)
> >
>
> Is my understanding of the tail code wrong?
> My understanding was that tail_forever_inotify() is not called for
> pipes, or is it being called when tailing a mixed collection of pipes
> and regular files? If there are subtleties like those you need to
> mention them , otherwise people will not be able to reproduce the
> problem that you are describing.

Well, on my openSUSE 15.4 at least, tail -f does use inotify on FIFOs and
indeed when data is spliced to the FIFO, tail doesn't notice.

> I need to warn you about something regarding this patch -
> often there are colliding interests among different kernel users -
> fsnotify use cases quite often collide with the interest of users tracking
> performance regressions and IN_ACCESS/IN_MODIFY on anonymous pipes
> specifically have been the source of several performance regression reports
> in the past and have driven optimizations like:
>
> 71d734103edf ("fsnotify: Rearrange fast path to minimise overhead
> when there is no watcher")
> e43de7f0862b ("fsnotify: optimize the case of no marks of any type")
>
> The moral of this story is: even if your patches are accepted by fsnotify
> reviewers, once they are staged for merging they will be subject to
> performance regression tests and I can tell you with certainty that
> performance regression will not be tolerated for the tail -f use case.
> I will push your v4 patches to a branch in my github, to let the kernel
> test bots run the performance regressions on it whenever they get to it.
>
> Moreover, if coreutils will change tail -f to start setting inotify watches
> on anonymous pipes (my understanding is that currently does not?),
> then any tail -f on anonymous pipe can cripple the "no marks on sb"
> performance optimization for all anonymous pipes and that would be
> a *very* unfortunate outcome.

Do you mean the "s_fsnotify_connectors" check? Yeah, a fsnotify watch on
any pipe inode is going to somewhat slow down the fsnotify calls for any
pipe. OTOH I don't expect inotify watches on pipe inodes to be common and
it is not like the overhead is huge. Also nobody really prevents you from
placing watch on pipe inode now with similar consequences, this patch only
makes it actually working with splice. So I'm not worried about the
performance impact. At least until somebody comes with a realistic
complaint ;-).

> I think we need to add a rule to fanotify_events_supported() to ban
> sb/mount marks on SB_KERNMOUNT and backport this
> fix to LTS kernels (I will look into it) and then we can fine tune
> the s_fsnotify_connectors optimization in fsnotify_parent() for
> the SB_KERNMOUNT special case.

Yeah, probably makes sense.

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR