Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking

From: Marek Szyprowski
Date: Fri Apr 29 2022 - 10:08:16 EST


Hi John,

On 27.04.2022 18:15, John Ogness wrote:
> On 2022-04-27, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote:
>> Here is the full serial console log:
>>
>> https://protect2.fireeye.com/v1/url?k=087c101e-57e728e3-087d9b51-000babff317b-69d8576a8b9d481f&q=1&e=5f72c413-9d23-4e64-98e4-377fcc2038de&u=https%3A%2F%2Fpastebin.com%2FE5CDH88L
> Here are a few ideas from me:
>
> 1. For next-20220427 the printk-threaded series was slightly changed. I
> do not expect it to work any different, but I would prefer we are
> debugging the current version. If possible, could you move to
> next-20220427?

I've moved to next-20220429. Nothing changed compared to next-20220427.


> 2. I noticed you boot with the kernel boot arguments "earlycon" and
> "no_console_suspend". Could you try booting without this? I expect this
> will make no difference.

Well, nothing changed.


> 3. It looks like the problem happens quite late in the boot process. I
> expect it is due to some userspace process that is running that is
> interacting with printk (either /dev/kmsg or /proc/kmsg) and is causing
> problems. If you boot with init=/bin/sh then I expect the system is
> running fine. (You don't have much of a system running, but it should
> not hang.) We need to isolate which userspace process is triggering the
> issue.

The same issue happens if I boot with init=/bin/bash


> 4. Have you tried issuing magic sysrq commands on the serial line? (For
> example, sending a break signal and then the letter 't' or sending a
> break signal and then the letter 'c'?) That might trigger various dumps
> so that we can see the system state.
>
> 5. You are not running a VT console, so the graphics driver should not
> be affecting the printk subsystem at all. I expect your autologin is
> also starting various services and programs. If you disable the
> automatic login and instead manually login (perhaps as another user) can
> you manually start those services one at a time to see at what point the
> system hangs?
>
> Thanks for you help with this!

I found something really interesting. When lockup happens, I'm still
able to log via ssh and trigger any magic sysrq action via
/proc/sysrq-trigger (triggering it from UART console via break doesn't
work).

It turned out that the UART console is somehow blocked, but it receives
and buffers all the input. For example after issuing "echo
>/proc/sysrq-trigger" from the ssh console, the UART console has been
updated and I see the magic sysrq banner and then all the commands I
blindly typed in the UART console! However this doesn't unblock the console.

Here is the output of 't' magic sys request:

https://pastebin.com/fjbRuy4f

If you have any more suggestion what to check let me know.

This issue must be somehow related to the way the UART driver works on
the Amlogic Meson boards. The other boards based on different SoCs
(Exynos, QCOM, BCM) I have in my test farm (with the same userspace and
configuration) work fine with those patches.

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland