Re: BUG_ON() in workingset_node_shadows_dec() triggers

From: Dave Chinner
Date: Wed Oct 05 2016 - 22:00:07 EST


On Tue, Oct 04, 2016 at 08:29:00PM -0700, Linus Torvalds wrote:
> On Tue, Oct 4, 2016 at 7:43 PM, Paul Gortmaker
> <paul.gortmaker@xxxxxxxxxxxxx> wrote:
> >
> > A couple years ago Ingo had an idea to kill BUG_ON abuse that I made
> > a 1st pass at. Back then it seemed nobody cared. Maybe that has since
> > changed?
> >
> > https://lkml.org/lkml/2014/4/30/359
>
> So we actually have the checkpatch warning already:
>
> # avoid BUG() or BUG_ON()
> if ($line =~ /\b(?:BUG|BUG_ON)\b/) {
> my $msg_type = \&WARN;
> $msg_type = \&CHK if ($file);
> &{$msg_type}("AVOID_BUG",
> "Avoid crashing the kernel - try
> using WARN_ON & recovery code rather than BUG() or BUG_ON()\n" .
> $herecurr);
> }
>
> but it doesn't trigger on VM_BUG_ON().
>
> And I'm not convinced about replacing things with BUG_ON_AND_HALT(),
> it simply doesn't fix the existing issue we have: people use BUG_ON(),
> and worse, _when_ they use BUG_ON(), they use it instead of error
> handling, so the code _around_ the BUG_ON() tends to then very much
> depend on what the BUG_ON() checks.
>
> This is actually one way that VM_BUG_ON() is better: it's very much by
> design something that can be compiled away, so at least hopefully
> nobody thinks of it as a security measure. So we could just say that
> we will treat VM_BUG_ON() as a WARN_ON_ONCE(), and just not kill the
> machine.

In XFS, we use ASSERT() (could be XFS_BUG_ON() for all
that the name matters) but we only define that to BUG_ON if
CONFIG_XFS_DEBUG=y.

For "production debug" kernels we have CONFIG_XFS_WARN=y, which
turns ASSERT() into WARN_ON(). We get the warnings, but none of the
crashiness that are desirable in a development context. This is what
distro debug kernels should be using, as it also ensures
we don't build in the real debug code that does things that would
affect prodution systems adversely, like randomly take different
allocator paths to ensure we get code coverage of all the allocator
algorithms...

i.e. production kernels ship with neither set, the debug kernel
ships with CONFIG_XFS_WARN=y, and we do all our development with
CONFIG_XFS_DEBUG=y.

I think this case falls into the "production debug" classification;
we want a warning, but we don't want the system to be taken down....

> But apart from the checkpatch thing, it's actually a pretty big change.

Yeah, that's why we added CONFIG_XFS_WARN=y to do this - it was a 20
line change to add XFS_CONFIG_WARN instead of having to audit and
modify ~1800 call sites to do something differently. And because we
know that ASSERT() is not present in all kernels, it isn't ever used
as a replacement for error handling. Perhaps that's the simplest
solution here as well....

Just my 2c worth.

-Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx