Re: [PATCH] init: Don't proxy console= to earlycon

From: Raul Rangel
Date: Fri Jul 14 2023 - 17:43:02 EST


On Fri, Jul 14, 2023 at 12:35 PM Petr Mladek <pmladek@xxxxxxxx> wrote:
>
> First, I am sorry I sent the first mail too early by mistake.
> (Friday evening effect).

No worries.

>
> On Fri 2023-07-14 11:21:09, Raul Rangel wrote:
> > On Fri, Jul 14, 2023 at 10:38 AM Petr Mladek <pmladek@xxxxxxxx> wrote:
> > >
> > > On Mon 2023-07-10 09:30:19, Raul Rangel wrote:
> > > > On Sun, Jul 9, 2023 at 8:43 PM Randy Dunlap <rdunlap@xxxxxxxxxxxxx> wrote:
> > > > >
> > > > >
> > > > >
> > > > > On 7/9/23 18:15, Mario Limonciello wrote:
> > > > > > On 7/9/23 18:46, Randy Dunlap wrote:
> > > > > >>
> > > > > >>
> > > > > >> On 7/7/23 18:17, Raul E Rangel wrote:
> > > > > >>> Right now we are proxying the `console=XXX` command line args to the
> > > > > >>> param_setup_earlycon. This is done because the following are
> > > > > >>> equivalent:
> > > > > >>>
> > > > > >>> console=uart[8250],mmio,<addr>[,options]
> > > > > >>> earlycon=uart[8250],mmio,<addr>[,options]
> > > > > >>>
> > > > > >>> In addition, when `earlycon=` or just `earlycon` is specified on the
> > > > > >>> command line, we look at the SPCR table or the DT to extract the device
> > > > > >>> options.
> > > > > >>>
> > > > > >>> When `console=` is specified on the command line, it's intention is to
> > > > > >>> disable the console. Right now since we are proxying the `console=`
> > > > > >>
> > > > > >> How do you figure this (its intention is to disable the console)?
> > > > > >
> > > >
> > > > https://www.kernel.org/doc/html/v6.1/admin-guide/kernel-parameters.html
> > > > says the following:
> > > > console=
> > > > { null | "" }
> > > > Use to disable console output, i.e., to have kernel
> > > > console messages discarded.
> > > > This must be the only console= parameter used on the
> > > > kernel command line.
> > > >
> > > > earlycon= [KNL] Output early console device and options.
> > > >
> > > > When used with no options, the early console is
> > > > determined by stdout-path property in device tree's
> > > > chosen node or the ACPI SPCR table if supported by
> > > > the platform.
> > >
> > > Sigh, I wasn't aware of this when we discussed the console= handling.
> >
> > It took a bit of digging to figure out what the actual intention was :)
> >
> > >
> > > > The reason this bug showed up is that ChromeOS has set `console=` for a
> > > > very long time:
> > > > https://chromium.googlesource.com/chromiumos/platform/crosutils/+/main/build_kernel_image.sh#282
> > > > I'm not sure on the exact history, but AFAIK, we don't have the ttyX devices.
> > > >
> > > > Coreboot recently added support for the ACPI SPCR table which in
> > > > combination with the
> > > > `console=` arg, we are now seeing earlycon enabled when it shouldn't be.
> > >
> > > But this happens only when both "earlycon" and "console=" parameters
> > > are used together. Do I get it correctly?
> >
> > The bug shows up when an SPCR table is present and the `console=`
> > parameter is set. No need to specify `earlycon` on the command line.
>
> Strange, see below.
>
> > > This combination is ambiguous on its own. Why would anyone add
> > > "earlycon" parameter and wanted to keep it disabled?
> >
> > This is not the case I'm hitting. I'm honestly not sure what the
> > behavior should be in the `earlycon console=` case?
> >
> > >
> > > > > >>> diff --git a/init/main.c b/init/main.c
> > > > > >>> index aa21add5f7c54..f72bf644910c1 100644
> > > > > >>> --- a/init/main.c
> > > > > >>> +++ b/init/main.c
> > > > > >>> @@ -738,8 +738,7 @@ static int __init do_early_param(char *param, char *val,
> > > > > >>> for (p = __setup_start; p < __setup_end; p++) {
> > > > > >>> if ((p->early && parameq(param, p->str)) ||
> > > > > >>> (strcmp(param, "console") == 0 &&
> > > > > >>> - strcmp(p->str, "earlycon") == 0)
> > > > > >>> - ) {
> > > > > >>> + strcmp(p->str, "earlycon") == 0 && val && val[0])) {
> > > > > >>> if (p->setup_func(val) != 0)
> > > > > >>> pr_warn("Malformed early option '%s'\n", param);
> > > > > >>> }
>
> My understanding is that this code in do_early_param() allows to call
> param_setup_earlycon() with the @val defined via console=val.
> It reduces cut&paste on the kernel command line.

Exactly

>
> It should never enable an early console when "earlycon" is not defined
> on the command line. Otherwise, console=uart[8250],mmio,<addr>[,options]
> would always enable earlycon as well.
>

Yep, this is what my patch fixes.

> If the "earlycon" is not defined on the command line then
> we should never call param_setup_earlycon() in the first place.
>
> Or the behavior is even more crazy than I thought.

This contradicts your first point. We need to call
`param_setup_earlycon` so it can handle `console=uart,mmio,XXXX`.
That's why this block of code is here. IMO it's very confusing
behavior that `earlycon=uart,mmio,XXXX` and `console=uart,mmio,XXXX`
are the same thing.

The reason my patch checks for a NULL or empty val is because
`param_setup_earlycon` has a special case to handle the
`earlycon`/`earlycon=` case:
https://chromium.googlesource.com/chromiumos/third_party/kernel/+/refs/heads/chromeos-6.1/drivers/tty/serial/earlycon.c#223

```
/* Just 'earlycon' is a valid param for devicetree and ACPI SPCR. */
if (!buf || !buf[0]) {
if (IS_ENABLED(CONFIG_ACPI_SPCR_TABLE)) {
earlycon_acpi_spcr_enable = true;
return 0;
} else if (!buf) {
return early_init_dt_scan_chosen_stdout();
}
}
```

Before my patch `console=` would call `param_setup_earlycon` which
would toggle the `earlycon_acpi_spcr_enable` flag true. This is
probably also the code that handled the naked `console` command too.

> > >
> > > + "console" enables the default console which might be overridden
> > > by ACPI SPCR and devicetree
> >
> > That's what this patch fixes. You need to specify `earlycon` in order
> > to get the ACPI SPCR or DT console.
>
> It sounds strange. earlycon is needed only for debugging. While
> ACPI SPRC or DT should define the preferred console by the platform.
>
> There are three levels of preference:
>
> + console= parameter defines the user preferred. It overrides
> everything.
>
> + ACPI SPCR or DT should define the preferred console by
> platform. It will be used when there is no user preference.
>
> + Kernel registers the first initialized console with tty driver
> when the is no preferred console by the user, ACPI SPCR, or DT.
>
> As I said, I would expect that early console is enabled only when
> earlycon parameter is defined on the command line.
>
> In each case, it seems that acpi_parse_spcr() and of_console_check()
> call add_preferred_console() even when earlycon is not defined
> on the commandline.
>

Currently the policy if SPCR is used for the default console is
determined by architectural policy. ARM64 enabled it, while x86 keeps
it disabled:
https://chromium.googlesource.com/chromiumos/third_party/kernel/+/refs/heads/chromeos-6.1/arch/arm64/kernel/acpi.c#229
https://chromium.googlesource.com/chromiumos/third_party/kernel/+/refs/heads/chromeos-6.1/arch/x86/kernel/acpi/boot.c#1748

I'm honestly not a fan of enabling the SPCR console by default. It
will slow down booting since things now need to get written to the
UART. It's useful for debugging problems before the real console
driver can register (ttyS0, etc). It would honestly be nice if kernel
OOPS/panics got written to the SPCR UART. The earlycon also has
problems in that the ACPI power resources can't be specified so we can
run into problems when the ttyS0 ACPI power resources get enumerated
and turned off because there isn't a driver registered yet. This can
cause the earlycon to die/hang since it just got powerd off. But I
digress :)

> > I don't see the `console` (without the =) documented:
> > https://www.kernel.org/doc/html/v6.1/admin-guide/kernel-parameters.html.
> > I'm guessing this is an undocumented "feature" that snuck in while the
> > `earlycon` stuff was being added.
>
> Honestly, I do not see where the "console" without '=' is handled.
> console_setup() does not check if the @str parameter is NULL.
>
>
> Anyway, the behavior already is complicated. But it might still
> make some sense when:
>
> + "earlycon" parameter would try to call param_setup_earlycon()
> with @val from "console=val" parameter. It reduces cut&paste.
>
> + "console=" causes that "ttynull" driver gets preferred. Which might
> cause that no console driver gets registered at all. [*]
>
> But seems to be yet another level of craziness when "console" or
> "console=" would affect whether the early console will try
> to be defined via ACPI SPCR or not.
>
> I believe that this patch solves the problem. But it looks
> like a workaround which makes the logic even more tricky/hacky.
>
>
> IMHO, the right fix is to make sure that param_setup_earlycon()
> should get called only when "earlycon" is defined on the commandline.

We can do that, but it will remove the `console=uart,mmio,XXXX`
handling. IMO that's the correct thing to do, but I suspect there are
a lot of people that depend on it.

>
> Best Regards,
> Petr