Re: printk.time causes rare kernel boot hangs

From: Linux regression tracking #update (Thorsten Leemhuis)
Date: Sun Jun 18 2023 - 06:26:22 EST


On 13.06.23 16:07, Linux regression tracking #adding (Thorsten Leemhuis)
wrote:
>
> On 13.06.23 15:41, Richard W.M. Jones wrote:
>> [Being tracked in this bug which contains much more detail:
>> https://gitlab.com/qemu-project/qemu/-/issues/1696 ]
>>
>> Recent kernels hang rarely when booted on qemu. Usually you need to
>> boot 100s or 1,000s of times to see the hang, compared to 292,612 [sic]
>> successful boots which I was able to do before the problematic commit.
>>
>> A reproducer (you'll probably need to use Fedora) is:
>>
>> $ while guestfish -a /dev/null -v run >& /tmp/log; do echo -n . ; done
>>
>> You will need to leave it running for probably several hours, and
>> examine the /tmp/log file at the end.
>>
>> I tracked this down to the following commit:
>>
>> commit f31dcb152a3d0816e2f1deab4e64572336da197d
>> Author: Aaron Thompson <dev@xxxxxxxxxx>
>> Date: Thu Apr 13 17:50:12 2023 +0000
>>
>> sched/clock: Fix local_clock() before sched_clock_init()
>>
>> Have local_clock() return sched_clock() if sched_clock_init() has not
>> yet run. sched_clock_cpu() has this check but it was not included in the
>> new noinstr implementation of local_clock().
>>
>> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f31dcb152a3d0816e2f1deab4e64572336da197d)
>>
>> Reverting this commit fixes the problem.
>>
>> I don't know _why_ this commit is wrong, but can we revert it as it
>> causes serious problems with libguestfs hanging randomly.
>>
>> Or if there's anything you want me to try out then let me know,
>> because I can reproduce the problem locally quite easily.
>
> Thanks for the report. To be sure the issue doesn't fall through the
> cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
> tracking bot:
>
> #regzbot ^introduced f31dcb152a3d0816e2f1deab4e64572336da197d
> #regzbot title sched/clock: printk.time causes rare kernel boot hangs
> #regzbot ignore-activity

#regzbot fix: tick/common: Align tick period during sched_timer setup
#regzbot monitor:
https://lore.kernel.org/all/12c6f9a3-d087-b824-0d05-0d18c9bc1bf3@xxxxxxxxxx/
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.