Re: PIC probing code from e179f6914152 failing

From: Thomas Gleixner
Date: Wed Oct 18 2023 - 18:50:31 EST


On Wed, Oct 18 2023 at 13:50, Mario Limonciello wrote:
> Recently an issue was reported to Bugzilla [1] that the Keyboard (IRQ 1)
> and GPIO controller (IRQ 7) weren't working properly on two separate
> Lenovo machines. The issues are unique to Linux.
>
> In digging through them, they happen because Lenovo didn't set up the
> PIC in the BIOS.
> Specifically the PIC probing code introduced by e179f6914152 ("x86, irq,
> pic: Probe for legacy PIC and set legacy_pic appropriately") expects
> that the BIOS sets up the PIC and uses that assertion to let Linux set
> it up.
>
> One of the reporters confirmed that reverting e179f6914152 fixes the
> issue. Keyboard and GPIO controller both work properly.
>
> I wanted to ask if we can please revert that and come up with a
> different solution for kexec with HyperV.
> Can guests instead perhaps detect in early boot code they're running in
> an EFI based hypervisor and explicitly set 'legacy_pic = &null_legacy_pic;'?

No. This detection mechanism prevents PIC usage also in other
scenarios.

It's perfectly valid code and the assumption that you can read back what
you wrote to the master IMR is entirely correct. If that's not the case
then there is no PIC, the BIOS has disabled some parts of the legacy
block or did not initialize it.

Letting the kernel blindly assume that there is always a PIC is just
wrong. Worse, it puts the burden on everyone else to sprinkle
"legacy_pic = null_pic;" all over the place with dubious
conditions. That's exactly what the commit in question avoided.

So no, we are not going back there.

We could obviously change the probe() function to issue a full PIC
initialization sequence before reading a known written value
back. That's basically what the revert causes to happen via
legacy_pic->init().

But I fundamentally hate to do that because forcing the init sequence
just to make presumably broken code which has some dubious dependencies
on the PIC or on nr_legacy_irqs > 0 happy is not really a solution when
the PIC is actually not needed at all. For anything modern where all
legacy interrupts are routed to the IO/APIC the PIC does not make any
sense whatsoever.

We rather go and fix the real underlying problem.

The kernel can handle the legacy interrupts perfectly fine through the
IOAPIC. There is no hard requirement for the PIC at all except for
really old systems which had the timer interrupt wired to the PIC and
therefore required to route the timer interrupt through the PIC instead
of the IO/APIC or did not provide routing entries for the IO/APIC. See
the horrible hackery in check_timer() and the grossly misnomed
init_IO_APIC_traps().

I just took a random machine, forced the NULL PIC and added
'apic=verbose' to the kernel command line and magically all the legacy
interrupts are set up via IO/APIC despite the NULL PIC and therefore 0
preallocated legacy interrupts.

apic 0 pin 0 not connected
IOAPIC[0]: Preconfigured routing entry (0-1 -> IRQ 1 Level:0 ActiveLow:0)
IOAPIC[0]: Preconfigured routing entry (0-2 -> IRQ 2 Level:0 ActiveLow:0)
IOAPIC[0]: Preconfigured routing entry (0-3 -> IRQ 3 Level:0 ActiveLow:0)
...

which is identical to the output with PIC enabled. That debug message is
emitted from mp_irqdomain_alloc() which is invoked via the PNP resource
management code.

Now /proc/interrupts:

CPU0 CPU1 CPU2 CPU3
1: 0 56 0 0 IO-APIC 1-edge i8042
4: 442 0 0 0 IO-APIC 4-edge ttyS0
8: 0 0 0 0 IO-APIC 8-edge rtc0
9: 0 0 0 0 IO-APIC 9-fasteoi acpi
12: 0 0 122 0 IO-APIC 12-edge i8042

Keyboard and serial are working, see?

The only interrupt which does not work is interrupt 0 because nothing
allocates interrupt 0 due to nr_legacy_irqs == 0, but that's a trivially
solvable problem.

That machine does not even need the timer interrupt because it has a
usable APIC and TSC deadline timer, so no APIC timer calibration
required. The same is true for CPUs which do not have a TSC deadline
timer, but enumerate the APIC frequency via CPUID or MSRs.

But that brings up an interesting question. How are those affected
machines even reaching a state where the user notices that just the
keyboard and the GPIO are not working? Why?

The CPUID/MSR APIC frequency enumeration is Intel specific and
everything else depends on a working timer interrupt to calibrate the
APIC timer frequency. So AMD CPUs require the timer interrupt to
work. The only explanation why this "works" in that null PIC case is
that the PIT/HPET interrupt is actually wired to pin 0, but that's
something to be determined...

Can I please get the following information from an affected machine:

1) dmesg with 'apic=verbose' on the command line
2) /proc/interrupts
3) /sys/kernel/debug/irq/irqs/{0..15}

#3 requires CONFIG_GENERIC_IRQ_DEBUGFS to be set.

Two versions of that please:

1) Unpatched kernel
2) Patched kernel

Thanks,

tglx