Re: [PATCH v3 00/11] Performance fixes for 9p filesystem

From: Christian Schoenebeck
Date: Sat Feb 04 2023 - 08:41:12 EST


On Friday, February 3, 2023 8:12:14 PM CET Eric Van Hensbergen wrote:
> Hi Christian, thanks for the feedback -- will dig in and see if I can
> find what's gone south here. Clearly my approach to writeback without
> writeback_fid didn't cover all the corner cases and thats the cause of
> the fault. Can I get a better idea of how to reproduce - you booted
> with a root 9p file system, and then tried to build...what?

KDE, which builds numerous packages, multi-threaded by default. In the past we
had 9p issues which triggered only after hours of compiling, however in this
case I don't think that you need to build something fancy. Because it already
fails at the very beginning of any build process, just when detecting a
compiler.

May I ask what kind of scenario you have tested so far? It was not a multi-
threaded context, right? Large chunk or small chunk I/O?

> Performance degradation is interesting, runs counter to the
> unit-testing and benchmarking I did, but I didn't do something as
> logical as a build to check -- need to tease apart whether this is a
> read problem, a write problem...or both. My intuition is that its on
> the write side, but as part of going through the code I made the cache
> code a lot more pessimistic so its possible I inadvertently killed an
> optimistic optimization.

I have not walked down the road to investigate individual I/O errors or even
their cause yet, but from my feeling it could also be related to fid vs.
writeback_fid. I saw you dropped a fix we made there last year, but haven't
checked yet whether your changes would handle it correctly in another way.

> Finally, just to clarify, the panic you had at the end happened with
> readahead? Seems interesting because clearly it thought it was
> writing back something that it shouldn't have been writing back (since
> writeback caches weren't enabled). I'm thinking something was marked
> as dirty even though the underlying system just wrote-through the
> change and so the writeback isn't actually required. This may also be
> an indicator of the performance issue if we are actually writing
> through the data in addition to an unnecessary write-back (which I
> also worry is writing back bad data in the second case).

It was not a kernel panic. It's a warning that appears right after boot, but
the system continues to run. So that warning is printed before starting the
actual build process. And yes, the warning is printed with "readahead".

> Can you give me an idea of what the other misbehaviors were?

There were really all sorts of misbheaviour on application level, e.g. no
command history being available from shell (arrow up/down), things hanging on
the shell for a long time, error messages. And after the writeahead test the
build directory was screwed, i.e. even after rebooting with a regular kernel
things no longer built correctly, so I had to restore a snapshot.

Best regards,
Christian Schoenebeck