Re: [PATCH] debug: Deprecate BUG_ON() use in new code, introduce CRASH_ON()

From: Ingo Molnar
Date: Mon Jun 08 2015 - 05:05:22 EST



* Alexander Holler <holler@xxxxxxxxxxxxx> wrote:

> Am 08.06.2015 um 10:08 schrieb Richard Weinberger:
> >On Mon, Jun 8, 2015 at 9:40 AM, Alexander Holler <holler@xxxxxxxxxxxxx> wrote:
> >>Am 08.06.2015 um 09:12 schrieb Ingo Molnar:
> >>>
> >>>
> >>>* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >>>
> >>>>Stop with the random BUG_ON() additions.
> >>>
> >>>
> >>>Yeah, so I propose the attached patch which attempts to resist new
> >>>BUG_ON()
> >>>additions.
> >>
> >>
> >>As this reminded me at flame I received once from a maintainer because I
> >>wanted to avoid a desastrous memory corruption by using a BUG_ON().
> >
> >Reference?
>
> https://lkml.org/lkml/2013/5/17/254
>
> To explain: The bug already existed for several releases and the memory
> corruption was that desatrous that it even leaded here to hard resets of systems
> without any oops. And fixing it needed several more releases (another year).
>
> And in the above mentioned case and the kernel config settings I use(d), only
> the wronggoing thread was killed by the BUG_ON (I proposed) before it had the
> chance to corrupt the memory.

Firstly, the changelog of the patch that Greg rejected told nothing about all that
thinking, so at minimum it's a deficient changelog.

Secondly and more importantly, instead of doing a BUG_ON() you could have done:

if (WARN_ON_ONCE(port->itty))
return;

This would probably have prevented the tty related memory corruption just as much,
at the cost of a (small and infrequent) memory leak.

I.e. instead of crashing the machine, you need to try to find the least
destructive approach if a bug is detected.

I am pretty certain that Greg would have applied such a patch in an eye blink.

> Maybe someone could clarify what Greg meant with "something _really_ bad",
> because in my humble opionion there aren't much more worse things than memory
> corruptions (e.g. by wrong pointers, use after free or similiar stuff) if that
> happens inside the kernel. The consequences of such are almost always
> unpredictable and therefor I would and likely will ever prefer a controlled
> shutdown, reset or similiar instead of leaving a system running with corrupted
> memory. Regardless what any maintainer will say.

So a justified BUG_ON() would be something during early boot for example, where a
grave inconsistency is detected that we know will make the kernel unable to work
much further.

We have only a few such cases: not finding a root filesystem, or detecting an x86
kernel image with instructions in it that are incompatible with the CPU it is
running on. We can do nothing to improve the situation, so we try to print
something useful and stop-crash the box.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/