Re: selftests: gpio: crash on arm64

From: Bartosz Golaszewski
Date: Tue Apr 11 2023 - 09:11:53 EST


On Tue, Apr 11, 2023 at 10:57 AM Linus Walleij <linus.walleij@xxxxxxxxxx> wrote:
>
> On Mon, Apr 10, 2023 at 11:16 AM Naresh Kamboju
> <naresh.kamboju@xxxxxxxxxx> wrote:
> (...)
> > Anders performed bisection on this problem.
> > The bisection have been poing to this commit log,
> > first bad commit: [24c94060fc9b4e0f19e6e018869db46db21d6bc7]
> > gpiolib: ensure that fwnode is properly set
>
> I don't think this is the real issue.
>
> (...)
> > # 2. Module load error tests
> > # 2.1 gpio overflow
> (...)
> > [ 88.900984] Freed in software_node_release+0xdc/0x108 age=34 cpu=1 pid=683
> > [ 88.907899] __kmem_cache_free+0x2a4/0x2e0
> > [ 88.912024] kfree+0xc0/0x1a0
> > [ 88.915015] software_node_release+0xdc/0x108
> > [ 88.919402] kobject_put+0xb0/0x220
> > [ 88.922919] software_node_notify_remove+0x98/0xe8
> > [ 88.927741] device_del+0x184/0x380
> > [ 88.931259] platform_device_del.part.0+0x24/0xa8
> > [ 88.935995] platform_device_unregister+0x30/0x50
>
> I think the refcount is wrong on the fwnode.
>
> The chip is allocated with devm_gpiochip_add_data() which will not call
> gpiochip_remove() until all references are removed by calling
> devm_gpio_chip_release().
>
> Add a pr_info() devm_gpio_chip_release() in drivers/gpio/gpiolib-devres.c
> and see if the callback is even called. I think this could be the
> problem: if that isn't cleaned up, there will be dangling references.
>
> diff --git a/drivers/gpio/gpiolib-devres.c b/drivers/gpio/gpiolib-devres.c
> index fe9ce6b19f15..30a0622210d7 100644
> --- a/drivers/gpio/gpiolib-devres.c
> +++ b/drivers/gpio/gpiolib-devres.c
> @@ -394,6 +394,7 @@ static void devm_gpio_chip_release(void *data)
> {
> struct gpio_chip *gc = data;
>
> + pr_info("GPIOCHIP %s WAS REMOVED BY DEVRES\n", gc->label);
> gpiochip_remove(gc);
> }
>
> If this isn't working we need to figure out what is holding a reference to
> the gpiochip.
>
> I don't know how the references to the gpiochip fwnode is supposed to
> drop to zero though? I didn't work with mockup much ...
>
> What I could think of is that maybe the mockup driver need a .shutdown()
> callback to forcibly call gpiochip_remove(), and in that case it should
> be wrapped in a non-existining devm_gpiochip_remove() since devres
> is used to register it.
>
> Bartosz will know better though! I am pretty sure he has this working
> flawlessly so the tests must be doing something weird which is leaving
> references around.
>
> Yours,
> Linus Walleij

Interestingly I'm not seeing this neither with gpio-sim selftests nor
with any of the libgpiod tests which suggests it's the gpio-mockup
module that's doing something wrong (or very right in which case it
uncovers some otherwise hidden bug). Anyway, I'll try to spend some
time on it and figure it out, although I'd like to be done with
gpio-mockup altogether already.

Bart