Re: [syzbot] [xfs?] WARNING: Reset corrupted AGFL on AG NUM. NUM blocks leaked. Please unmount and run xfs_repair.

From: Dave Chinner
Date: Thu Jun 22 2023 - 05:09:11 EST


On Wed, Jun 21, 2023 at 12:54:21AM -0700, Eric Biggers wrote:
> On Wed, Jun 21, 2023 at 05:07:15PM +1000, 'Dave Chinner' via syzkaller-bugs wrote:
> > On Tue, Jun 20, 2023 at 07:10:19PM -0700, syzbot wrote:
> > So exactly what is syzbot complaining about here? There's no kernel
> > issue here at all.
> >
> > Also, I cannot tell syzbot "don't ever report this as a bug again",
> > so the syzbot developers are going to have to triage and fix this
> > syzbot problem themselves so it doesn't keep getting reported to
> > us...
>
> I think the problem here was that XFS logged a message beginning with
> "WARNING:", followed by a stack trace. In the log that looks like a warning
> generated by the WARN_ON() macro, which is meant for reporting recoverable
> kernel bugs. It's difficult for any program to understand the log in cases like
> this. This is why include/asm-generic/bug.h contains the following comment:
>
> * Do not include "BUG"/"WARNING" in format strings manually to make these
> * conditions distinguishable from kernel issues.

Nice.

Syzbot author doesn't like log messages using certain key words
because it's hard for syzbot to work out what went wrong.

Gets new rule added to kernel in a comment in some header file that
almost nobody doing kernel development work ever looks at.

Nothing was added to the coding style rules or checkpatch so nobody
is likely to accidentally trip over this new rule that nobody has
been told about.

Syzbot maintainer also fails to do an audit of the kernel to remove
all existing "WARNING" keywords from existing log messages so leaves
landmines for subsystems to have to handle at some time in the
future.

Five years later, syzbot trips over a log message containing WARNING
in it that was in code introduced before the rule was "introduced".
Subsystem maintainers are blamed for not know the rule existed.

Result: *yet again* we are being told that our only option is
to *change code that is not broken* just to *shut up some fucking
bot* we have no control over and could happily live without.

> If you have a constructive suggestion of how all programs that
> parse the kernel log can identify real warnings reliably without
> getting confused by cases like this, I'm sure that would be
> appreciated. It would need to be documented and then the guidance
> in bug.h could then be removed. But until then, the above is the
> current guidance.

That is so not the problem here, Eric.

--
Dave Chinner
david@xxxxxxxxxxxxx