Re: BUG: workqueue lockup (2)

From: Dmitry Vyukov
Date: Thu Dec 21 2017 - 05:19:39 EST


On Wed, Dec 20, 2017 at 11:55 AM, Tetsuo Handa
<penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
> Dmitry Vyukov wrote:
>> On Tue, Dec 19, 2017 at 3:27 PM, Tetsuo Handa
>> <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
>> > syzbot wrote:
>> >>
>> >> syzkaller has found reproducer for the following crash on
>> >> f3b5ad89de16f5d42e8ad36fbdf85f705c1ae051
>> >
>> > "BUG: workqueue lockup" is not a crash.
>>
>> Hi Tetsuo,
>>
>> What is the proper name for all of these collectively?
>
> I think that things which lead to kernel panic when /proc/sys/kernel/panic_on_oops
> was set to 1 are called an "oops" (or a "kerneloops").
>
> Speak of "BUG: workqueue lockup", this is not an "oops". This message was
> added by 82607adcf9cdf40f ("workqueue: implement lockup detector"), and
> this message does not always indicate a fatal problem. This message can be
> printed when the system is really out of CPU and memory. As far as I tested,
> I think that workqueue was not able to run on specific CPU due to a soft
> lockup bug.


There are also warnings which don't panic normally, unless
panic_on_warn is set. There are also cases when we suddenly lost a
machine and have no idea what happened with it. And also cases when we
are kind-a connected, and nothing bad is printed on console, but it's
still un-operable.
The only collective name I can think of is bug. We could change it to
bug. Otherwise since there are multiple names, I don't think it's
worth spending more time on this.