Re: Boot stall regression from "printk for 5.19" merge

From: Petr Mladek
Date: Mon Jun 20 2022 - 07:44:24 EST


On Mon 2022-06-20 11:29:36, Marek Behún wrote:
> On Mon, 20 Jun 2022 00:29:16 +0206
> John Ogness <john.ogness@xxxxxxxxxxxxx> wrote:
> > On 2022-06-19, Marek Behún <kabel@xxxxxxxxxx> wrote:
> > > causes a regression on arm64 (Marvell CN9130-CRB board) where the
> > > system boot freezes in most cases (and is unusable until restarted by
> > > watchdog), or, in some cases boots, but the console output gets mangled
> > > for a while (the serial console spits garbage characters).
>
> attaching bootlogs and config.

This is the log when the system booted:

> [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd083]
> [ 0.000000] Linux version 5.19.0-rc2-00410-g9776fe0f424b (kabel@dellmb) (aarch64-unknown-linux-gnu-gcc (Gentoo Hardened 10.3.1_p20211126 p0) 10.3.1 20211126, GNU ld (Gentoo 2.37_p1 p2) 2.37) #491 SMP Mon Jun 20 11:00:54 CEST 2022
> [ 0.000000] Machine model: Marvell Armada CN9130-CRB-B
> [ 0.000000] earlycon: uart8250 at MMIO32 0x00000000f0512000 (options '')
> [ 0.000000] printk: bootconsole [uart8250] enabled

Early console enabled.

> [ 0.000000] NUMA: No NUMA configuration found
[...]
> [ 0.062565] rcu: Hierarchical SRCU implementation.
> [ 0.062589] printk: bootconsole [uart8250] printing thread started

The early console started being handled by the kthread.

> [ 0.073843] smp: Bringing up secondary CPUs ...
> [ 0.074238] Detected PIPT I-cache on CPU1
[...]
> [ 1.067359] io scheduler kyber registered
> [ 1.120214] armada-ap806-pinctrl f06f4000.system-controller:pinctrl: registered pinctrl driver
> [ 1.120577] armada-cp110-pinctrl f2440000.system-controller:pinctrl: registered pinctrl driver
> [ 1.137980] mv_xor_v2 f0400000.xor: Marvell Version 2 XOR driver
> [ 1.166562] printk:[ console [ttyS0] printing thread started
> [ 1.166564] printk: console [ttyS0] enabled

2nd console was added using the properly initialized serial port.
It should use the same physical port as the early console.

Both early console and proper console driver has its own kthread.

> 1.166486] f0512000.serial: ttyS0 at MMIO 0xf0512000 (irq = 22, base_baud = 12500000) is a 16550A

The line is malformed. I wonder if both early console and proper
console used the same port in parallel.

> [ 1.166567] printk: bootconsole [uart8250] disabled
> [ 1.185422] printk: bootconsole [uart8250] printing thread stopped

The early console was disabled. Only the properly initialized serial
console is used. All should be fine now.


> [ 1.188773] brd: module loaded
> [ 1.190567] loop: module loaded
[...]
> [ 5.316958] Freeing unused kernel memory: 2752K
> [ 5.364349] Run /sbin/init as init process

And I did not catch any further problem.

So, it looks like that con->write() code is not correctly serialized
between the early and normal console.


Now, let's see the last lines of failing logs:


> [ 1.071214] io scheduler kyber registered
> [ 1.124272] armada-ap806-pinctrl f06f4000.system-controller:pinctrl: registered pinctrl driver
> [

> [ 1.067314] io scheduler kyber registered
> [ 1.120226] armada-ap806-pinctrl f06f4000.system-controller:pinctrl: registered pinctrl driver
> [ 1.120603] armada-cp110-pinctrl f2440000.system-controller:pinctrl: registered pinctrl driver
> [ 1.137975] mv_xor_v2 f0400000.xor: Marvell Version 2 XOR driver
> [ 1.138248] mv_xor_v2 f0420000.xor: Marvell Version 2 XOR driver
> [ 1.

> [ 1.067214] io scheduler kyber registered
> [ 1.120098] armada-ap806-pinctrl f06f4000.system-controller:pinctrl: registered pinctrl driver
> [ 1.120466] armada-cp110-pinctrl f2440000.system-controller:pinctrl: registered pinctrl driver
> [ 1.137871] mv_xor_v2 f0400000.xor: Marvell Version 2 XOR driver
> [ 1.138160] mv_xor_v2 f0420000.xor: Marvell Version 2 XOR driver
> [

All three logs end in the middle of a line. If you compare it with the
"working" log then the end 1-3 lines before the normal console was added.

The console output might is delayed because of the threads. Most
likely, the output ended when both early and normal console driver
started to use the same port.

I am going to check the driver...

Best Regards,
Petr