Re: [PATCH v4 1/3] splice: always fsnotify_access(in), fsnotify_modify(out) on success

From: Amir Goldstein
Date: Wed Jun 28 2023 - 04:41:33 EST


On Tue, Jun 27, 2023 at 11:50 PM Ahelenia Ziemiańska
<nabijaczleweli@xxxxxxxxxxxxxxxxxx> wrote:
>
> The current behaviour caused an asymmetry where some write APIs
> (write, sendfile) would notify the written-to/read-from objects,
> but splice wouldn't.
>
> This affected userspace which uses inotify, most notably coreutils
> tail -f, to monitor pipes.
> If the pipe buffer had been filled by a splice-family function:
> * tail wouldn't know and thus wouldn't service the pipe, and
> * all writes to the pipe would block because it's full,
> thus service was denied.
> (For the particular case of tail -f this could be worked around
> with ---disable-inotify.)
>

Is my understanding of the tail code wrong?
My understanding was that tail_forever_inotify() is not called for
pipes, or is it being called when tailing a mixed collection of pipes
and regular files? If there are subtleties like those you need to
mention them , otherwise people will not be able to reproduce the
problem that you are describing.

I need to warn you about something regarding this patch -
often there are colliding interests among different kernel users -
fsnotify use cases quite often collide with the interest of users tracking
performance regressions and IN_ACCESS/IN_MODIFY on anonymous pipes
specifically have been the source of several performance regression reports
in the past and have driven optimizations like:

71d734103edf ("fsnotify: Rearrange fast path to minimise overhead
when there is no watcher")
e43de7f0862b ("fsnotify: optimize the case of no marks of any type")

The moral of this story is: even if your patches are accepted by fsnotify
reviewers, once they are staged for merging they will be subject to
performance regression tests and I can tell you with certainty that
performance regression will not be tolerated for the tail -f use case.
I will push your v4 patches to a branch in my github, to let the kernel
test bots run the performance regressions on it whenever they get to it.

Moreover, if coreutils will change tail -f to start setting inotify watches
on anonymous pipes (my understanding is that currently does not?),
then any tail -f on anonymous pipe can cripple the "no marks on sb"
performance optimization for all anonymous pipes and that would be
a *very* unfortunate outcome.

I think we need to add a rule to fanotify_events_supported() to ban
sb/mount marks on SB_KERNMOUNT and backport this
fix to LTS kernels (I will look into it) and then we can fine tune
the s_fsnotify_connectors optimization in fsnotify_parent() for
the SB_KERNMOUNT special case.
This may be able to save your patch for the faith of NACKed
for performance regression.

> Generate modify out before access in to let inotify merge the
> modify out events in thr ipipe case.

This comment is not clear and does not belong in this context,
but it very much belongs near the code in question.

Please wait to collect more feedback and specifically
to hear what Jan has to say about this hack before posting v5!

FYI, we are now in the beginning of the 6.5 "merge window",
which means that maintainers may be less responsive for the
next two weeks to non-critical patches as this one, which are
not targeted for the 6.5 kernel release.

Thanks,
Amir.