Re: PIC probing code from e179f6914152 failing

From: Mario Limonciello
Date: Fri Oct 20 2023 - 13:13:33 EST


On 10/20/2023 10:16, Hans de Goede wrote:
Hi Mario,

On 10/19/23 23:20, Mario Limonciello wrote:
On 10/18/2023 17:50, Thomas Gleixner wrote:

<snip>

But that brings up an interesting question. How are those affected
machines even reaching a state where the user notices that just the
keyboard and the GPIO are not working? Why?

So the GPIO controller driver (pinctrl-amd) uses platform_get_irq() to try to discover the IRQ to use.

This calls acpi_irq_get() which isn't implemented on x86 (hardcodes -EINVAL).

I can "work around it" by:

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 76bfcba25003..2b4b436c65d8 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -187,7 +187,8 @@ int platform_get_irq_optional(struct platform_device *dev, unsigned int num)
        }

        r = platform_get_resource(dev, IORESOURCE_IRQ, num);
-       if (has_acpi_companion(&dev->dev)) {
+       if (IS_ENABLED(CONFIG_ACPI_GENERIC_GSI) &&
+            has_acpi_companion(&dev->dev)) {
                if (r && r->flags & IORESOURCE_DISABLED) {
                        ret = acpi_irq_get(ACPI_HANDLE(&dev->dev), num, r);
                        if (ret)

but the resource that is returned from the next hunk has the resource flags set wrong in the NULL pic case:

NULL case:
r: AMDI0030:00 flags: 0x30000418
PIC case:
r: AMDI0030:00 flags: 0x418

IOW NULL pic case has IORESOURCE_DISABLED / IORESOURCE_UNSET

This then later the GPIO controller interrupts are not actually working.
For example the attn pin for my I2C touchpad doesn't work.

Right the issue is that with the legacy-pic path disabled /
with nr_legacy_irqs() returning 0 them there is no mapping
added for the Legacy ISA IRQs which causes this problem.

My hack to set nr_legacy_irqs to 16 also for the NULL PIC from:
https://bugzilla.kernel.org/show_bug.cgi?id=218003

Does cause the Legacy ISA IRQ mappings to get added and makes
the GPIO controller actually work, as can be seen from:

https://bugzilla.kernel.org/attachment.cgi?id=305241&action=edit

Which is a dmesg with that hack and it does NOT have this error:

[ 0.276113] amd_gpio AMDI0030:00: error -EINVAL: IRQ index 0 not found
[ 0.278464] amd_gpio: probe of AMDI0030:00 failed with error -22

and the reporter also reports the touchpad works with this patch.

As Thomas already said the legayc PIC really is not necessary,
but what is still necessary on these laptops with the legacy PIC
not initialized is to have the Legacy ISA IRQ mappings added
by the kernel itself since these are missing from the MADT
(if I have my ACPI/IOAPIC terminology correct).

They're not missing, the problem is that the ioapic code doesn't
let it get updated because of what I see as an extra nr_legacy_irqs()
check.

The series I posted I believe fixes this issue.


This quick hack (which is the one from the working dmesg)
does this:

--- a/arch/x86/kernel/i8259.c
+++ a/arch/x86/kernel/i8259.c
@@ -394,7 +394,7 @@ static int legacy_pic_probe(void)
}
struct legacy_pic null_legacy_pic = {
- .nr_legacy_irqs = 0,
+ .nr_legacy_irqs = NR_IRQS_LEGACY,
.chip = &dummy_irq_chip,
.mask = legacy_pic_uint_noop,
.unmask = legacy_pic_uint_noop,

But I believe this will break things when there are actually
non legacy ISA IRQs / GSI-s using GSI numbers < NR_IRQS_LEGACY

Thomas, I'm not at all familiar with this area of the kernel,
but would checking if the MADT defines any non ISA GSIs under
16 and if NOT use nr_legacy_irqs = NR_IRQS_LEGACY for the
NULL PIC be an option?

Or maybe some sort of DMI (sys_vendor == Lenovo) quirk to
set nr_legacy_irqs = NR_IRQS_LEGACY for the NULL PIC ?


I'd prefer we don't do this.
As tglx pointed out there is an underlying bug and we shouldn't paper over it with quirks.

My guess at what he doesn't see this issue on his system is that the default preconfigured IOAPIC mappings (polarity and triggering) happen to match the values that would have been programmed from _CRS.

That's not the case here.