Re: linux 5.15-rc4: refcount underflow when unloading gpio-mockup

From: Kent Gibson
Date: Mon Oct 04 2021 - 10:18:05 EST


On Mon, Oct 04, 2021 at 04:20:55PM +0300, Heikki Krogerus wrote:
> On Mon, Oct 04, 2021 at 08:47:01PM +0800, Kent Gibson wrote:
> > On Mon, Oct 04, 2021 at 03:30:43PM +0300, Heikki Krogerus wrote:
> > > On Mon, Oct 04, 2021 at 08:19:42PM +0800, Kent Gibson wrote:
> > > > On Mon, Oct 04, 2021 at 11:44:17AM +0200, Greg Kroah-Hartman wrote:
> > > > > On Mon, Oct 04, 2021 at 05:34:16PM +0800, Kent Gibson wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I'm seeing a refcount underflow when I unload the gpio-mockup module on
> > > > > > Linux v5.15-rc4 (and going back to v5.15-rc1):
> > > > > >
> > > > > > # modprobe gpio-mockup gpio_mockup_ranges=-1,4,-1,10
> > > > > > # rmmod gpio-mockup
> > > > > > ------------[ cut here ]------------
> > > > > > refcount_t: underflow; use-after-free.
> > > > > > WARNING: CPU: 0 PID: 103 at lib/refcount.c:28 refcount_warn_saturate+0xd1/0x120
> > > > > > Modules linked in: gpio_mockup(-)
> > > > > > CPU: 0 PID: 103 Comm: rmmod Not tainted 5.15.0-rc4 #1
> > > > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> > > > > > EIP: refcount_warn_saturate+0xd1/0x120
> > > > > > Code: e8 a2 b0 3b 00 0f 0b eb 83 80 3d db 2a 8c c1 00 0f 85 76 ff ff ff c7 04 24 88 85 78 c1 b1 01 88 0d db 2a 8c c1 e8 7d b0 3b 00 <0f> 0b e9 5b ff ff ff 80 3d d9 2a 8c c1 00 0f 85 4e ff ff ff c7 04
> > > > > > EAX: 00000026 EBX: c250b100 ECX: f5fe8c28 EDX: 00000000
> > > > > > ESI: c244860c EDI: c250b100 EBP: c245be84 ESP: c245be80
> > > > > > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00000296
> > > > > > CR0: 80050033 CR2: b7e3c3e1 CR3: 024ba000 CR4: 00000690
> > > > > > Call Trace:
> > > > > > kobject_put+0xdc/0xf0
> > > > > > software_node_notify_remove+0xa8/0xc0
> > > > > > device_del+0x15a/0x3e0
> > > > > > ? kfree_const+0xf/0x30
> > > > > > ? kobject_put+0xa6/0xf0
> > > > > > ? module_remove_driver+0x73/0xa0
> > > > > > platform_device_del.part.0+0xf/0x80
> > > > > > platform_device_unregister+0x19/0x40
> > > > > > gpio_mockup_unregister_pdevs+0x13/0x1b [gpio_mockup]
> > > > > > gpio_mockup_exit+0x1c/0x68c [gpio_mockup]
> > > > > > __ia32_sys_delete_module+0x137/0x1e0
> > > > > > ? task_work_run+0x61/0x90
> > > > > > ? exit_to_user_mode_prepare+0x1b5/0x1c0
> > > > > > __do_fast_syscall_32+0x50/0xc0
> > > > > > do_fast_syscall_32+0x32/0x70
> > > > > > do_SYSENTER_32+0x15/0x20
> > > > > > entry_SYSENTER_32+0x98/0xe7
> > > > > > EIP: 0xb7eda549
> > > > > > Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
> > > > > > EAX: ffffffda EBX: 0045a19c ECX: 00000800 EDX: 0045a160
> > > > > > ESI: fffffffe EDI: 0045a160 EBP: bff19d08 ESP: bff19cc8
> > > > > > DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000202
> > > > > > ---[ end trace 3d71387f54bc2d06 ]---
> > > > > >
> > > > > > I suspect this is related to the recent changes to swnode.c or
> > > > > > platform.c, as gpio-mockup hasn't changed, but haven't had the
> > > > > > chance to debug further.
> > > > >
> > > > > Any chance you can run 'git bisect' for this?
> > > > >
> > > >
> > > > That results in:
> > > >
> > > > bd1e336aa8535a99f339e2d66a611984262221ce is the first bad commit
> > > > commit bd1e336aa8535a99f339e2d66a611984262221ce
> > > > Author: Heikki Krogerus <heikki.krogerus@xxxxxxxxxxxxxxx>
> > > > Date: Tue Aug 17 13:24:49 2021 +0300
> > > >
> > > > driver core: platform: Remove platform_device_add_properties()
> > >
> > > Can you test does this patch help:
> > > https://lore.kernel.org/all/20210930121246.22833-3-heikki.krogerus@xxxxxxxxxxxxxxx/
> > >
> >
> > You sure that is the patch you have in mind? It only removes dead code,
> > so I don't see how that would help. And it isn't quite dead either -
> > drivers/pci/quirks.c is still using device_add_properties(), so it won't
> > build.
>
> Right, so can you test with the whole series that patch is part of?
>

Well, you could've said that to start with ;-).

> > Looking at the offending patch, it effectively replaces a call to
> > device_add_properties() with one to
> > device_create_managed_software_node(), and those two functions appear
> > quite different - at least at first glance.
> > Is that correct?
>
> The only real difference between the two functions is that
> device_create_managed_software_node() marks the software node it
> creates (and it does it exactly the same way as
> device_add_properties()) as "managed" with a specific flag.
>

Yeah, my bad - not sure what function I was looking at but it wasn't
device_create_managed_software_node().

> It means that when the device is removed, so is the software node.
> It happens when device_del() calls device_platform_notify_remove(),
> which then calls software_node_notify_remove().
>
> The problem is that after doing that step, device_del() then calls
> device_remove_properties() unconditionally which also attempts to
> remove the software node. So you end up doing the same thing twice.
>
> So the code in the patch that we're interested, and that I would like
> you to test, is this:
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 938cfcd1674eb..152a611a7e9ca 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -3583,7 +3583,6 @@ void device_del(struct device *dev)
> device_pm_remove(dev);
> driver_deferred_probe_del(dev);
> device_platform_notify_remove(dev);
> - device_remove_properties(dev);
> device_links_purge(dev);
>
> if (dev->bus)
>

Makes sense.
Good news is the patch built.
Bad news is I still get the same result:

------------[ cut here ]------------
refcount_t: underflow; use-after-free.
WARNING: CPU: 0 PID: 98 at lib/refcount.c:28 refcount_warn_saturate+0xf4/0x150
Modules linked in: gpio_mockup(-)
CPU: 0 PID: 98 Comm: rmmod Not tainted 5.15.0-rc4 #1
Hardware name: linux,dummy-virt (DT)
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : refcount_warn_saturate+0xf4/0x150
lr : refcount_warn_saturate+0xf4/0x150
sp : ffffffc010be3c50
x29: ffffffc010be3c50 x28: ffffff8001fa5780 x27: 0000000000000000
x26: 0000000000000000 x25: 0000000000000000 x24: ffffffc010a14518
x23: 0000000000000000 x22: ffffffc010a13798 x21: ffffffc010a7a480
x20: ffffff8002126c10 x19: ffffff8002146600 x18: fffffffffffe25f8
x17: 666f733d4d455453 x16: 5953425553003065 x15: ffffffc01098ac80
x14: fffffffffffc25f7 x13: 2e656572662d7265 x12: ffffffc01098acd0
x11: 0000000000000093 x10: 6e75203a745f746e x9 : 00000000ffffefff
x8 : ffffffc0109e2cd0 x7 : 0000000000017fe8 x6 : 0000000000000001
x5 : ffffff8007f8fa68 x4 : 0000000000000000 x3 : 0000000000000027
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff8001fa5780
Call trace:
refcount_warn_saturate+0xf4/0x150
kobject_put+0xf4/0x110
software_node_notify_remove+0xc4/0xe0
device_del+0x18c/0x430
platform_device_del.part.0+0x1c/0x90
platform_device_unregister+0x28/0x50
gpio_mockup_unregister_pdevs+0x28/0x50 [gpio_mockup]
gpio_mockup_exit+0x28/0x3c0 [gpio_mockup]
__arm64_sys_delete_module+0x180/0x200
invoke_syscall+0x54/0x130
el0_svc_common.constprop.0+0x44/0xf0
do_el0_svc+0x40/0xa0
el0_svc+0x20/0x60
el0t_64_sync_handler+0x1a4/0x1b0
el0t_64_sync+0x1a0/0x1a4
---[ end trace 44421b7a22d450dd ]---


Cheers,
Kent.