Re: [PATCH v4] gpio: Return EPROBE_DEFER if gc->to_irq is NULL

From: Shreeya Patel
Date: Fri Feb 11 2022 - 05:03:58 EST



On 11/02/22 6:56 am, Gabriel Krisman Bertazi wrote:
Bartosz Golaszewski <brgl@xxxxxxxx> writes:

My email address changed in September, that's why I didn't see the
email you sent in November to my old one.
Hi Bart,

thanks for the prompt reply and sorry for the wrong email address.

gpiod_to_irq() can be used in context other than driver probing, I'm
worried existing users would not know how to handle it. Also: how come
you can get the GPIO descriptor from the provider but its interrupts
are not yet set up?
I'm definitely some context here, as its been quite a while.
Shreeya, feel free to pitch in. :)


Existing users will probably receive -ENXIO in case to_irq is not
set and wasn't intended to be set.
We are trying to solve the race which happens frequently in cases
where I2C is set as built-in and pinctrl-amd is set as module.
There is no dependency between I2C and pinctrl-amd, while pinctrl-amd is
still trying to set the gc irq members through gpiochip_add_irqchip, I2C
calls gpiod_to_irq() which leads to returning -ENXIO since gc->to_irq is still NULL


There have also been cases where gc->to_irq is set successfully but other members
are yet to be initalized by gpiochip_add_irqchip like the domain variable which is
being used in .to_irq() and ultimately leads to a NULL pointer dereference as Gabriel
mentioned. I am working on a fix which would use mutex to not let gc irq members
be accessed until they all have been completely initialized.

I2C calls gpiod_to_irq through the following stack trace

kernel: Call Trace:
kernel:  gpiod_to_irq.cold+0x49/0x8f
kernel:  acpi_dev_gpio_irq_get_by+0x113/0x1f0
kernel:  i2c_acpi_get_irq+0xc0/0xd0
kernel:  i2c_device_probe+0x28a/0x2a0
kernel:  really_probe+0xf2/0x460
kernel:  driver_probe_device+0xe8/0x160

and pinctrl-amd makes gc visible through gpiochip_add_data_with_key()


Thanks,
Shreeya Patel


This is one of the races we saw in gpiochip_add_irqchip, depending on
the probe order. The gc is already visible while partially initialized,
if pinctrl-amd hasn't been probed yet. Another device being probed can
hit an -ENXIO here if to_irq is yet uninitialized or enter .to_irq() and
oops. Shreeya's patch workarounds the first issue, but is not a
solution for the second.

There is another patch that has been flying around to address the Oops.

https://lkml.org/lkml/2021/11/8/900

She's been working on a proper solution for that one, which might
actually address this too and replace the current patch. Maybe you
could help us get to a proper solution there? I'm quite unfamiliar with
this code myself :)