Re: [resend][bug] low-probability console lockups since 5.19

From: Thorsten Leemhuis
Date: Thu Sep 29 2022 - 05:06:15 EST


Hi Conor

On 28.09.22 18:55, Conor Dooley wrote:
> On Fri, Sep 23, 2022 at 05:24:17PM +0100, Conor Dooley wrote:
>>
>> Been bisecting a bug that is causing a boot failure in my CI & have
>> ended up here.. The bug in question is a low(ish) probability lock up
>> of the serial console, I would estimate about 1-in-5 chance on the
>> boards I could actually trigger it on which it has taken me so long
>> to realise that this was an actual problem. Thinking back on it, there
>> were other failures that I would retroactively attribute to this
>> problem too, but I had earlycon disabled
> [...]
> #regzbot introduced: 5831788afb17b89c5b531fb60cbd798613ccbb63 ^
> Hopefully I did this correctly...

Yes, you did, thx for this. I already had been watching this thread
manually and was a bit unsure what to do with it.

> I picked that commit as that's where things start going haywire.

There is one thing I wonder when skimming this thread: was there maybe
some other change somewhere in the kernel between the introduction and
the revert of the printk console kthreads patches that is the real
culprit here that makes existing, older races easier to hit? But I guess
in the end that would be very hard to find and it's easier to fix the
problem in the console driver... :-/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

#regzbot backburner: tricky situation that might take some time to get
resolved