Re: [BUG][NEW DATA] Kmemleak, possibly hiddev_connect(), in 6.3.0+ torvalds tree commit gfc4354c6e5c2

From: Greg Kroah-Hartman
Date: Wed May 17 2023 - 14:57:44 EST


On Wed, May 17, 2023 at 06:10:54PM +0200, Mirsad Goran Todorovac wrote:
> On 5/16/23 16:36, Greg Kroah-Hartman wrote:
> > On Fri, May 12, 2023 at 11:33:31PM +0200, Mirsad Goran Todorovac wrote:
> > > Hi,
> > >
> > > On 5/9/23 04:59, Greg Kroah-Hartman wrote:
> > > > On Tue, May 09, 2023 at 01:51:35AM +0200, Mirsad Goran Todorovac wrote:
> > > > >
> > > > >
> > > > > On 08. 05. 2023. 16:01, Greg Kroah-Hartman wrote:
> > > > > > On Mon, May 08, 2023 at 08:51:55AM +0200, Greg Kroah-Hartman wrote:
> > > > > > > On Mon, May 08, 2023 at 08:30:07AM +0200, Mirsad Goran Todorovac wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > There seems to be a kernel memory leak in the USB keyboard driver.
> > > > > > > >
> > > > > > > > The leaked memory allocs are 96 and 512 bytes.
> > > > > > > >
> > > > > > > > The platform is Ubuntu 22.04 LTS on a assembled AMD Ryzen 9 with X670E PG
> > > > > > > > Lightning mobo,
> > > > > > > > and Genius SlimStar i220 GK-080012 keyboard.
> > > > > > > >
> > > > > > > > (Logitech M100 HID mouse is not affected by the bug.)
> > > > > > > >
> > > > > > > > BIOS is:
> > > > > > > >
> > > > > > > >      *-firmware
> > > > > > > >           description: BIOS
> > > > > > > >           vendor: American Megatrends International, LLC.
> > > > > > > >           physical id: 0
> > > > > > > >           version: 1.21
> > > > > > > >           date: 04/26/2023
> > > > > > > >           size: 64KiB
> > > > > > > >
> > > > > > > > The kernel is 6.3.0-torvalds-<id>-13466-gfc4354c6e5c2.
> > > > > > > >
> > > > > > > > The keyboard is recognised as Chicony:
> > > > > > > >
> > > > > > > >                  *-usb
> > > > > > > >                       description: Keyboard
> > > > > > > >                       product: CHICONY USB Keyboard
> > > > > > > >                       vendor: CHICONY
> > > > > > > >                       physical id: 2
> > > > > > > >                       bus info: usb@5:2
> > > > > > > >                       logical name: input35
> > > > > > > >                       logical name: /dev/input/event4
> > > > > > > >                       logical name: input35::capslock
> > > > > > > >                       logical name: input35::numlock
> > > > > > > >                       logical name: input35::scrolllock
> > > > > > > >                       logical name: input36
> > > > > > > >                       logical name: /dev/input/event5
> > > > > > > >                       logical name: input37
> > > > > > > >                       logical name: /dev/input/event6
> > > > > > > >                       logical name: input38
> > > > > > > >                       logical name: /dev/input/event8
> > > > > > > >                       version: 2.30
> > > > > > > >                       capabilities: usb-2.00 usb
> > > > > > > >                       configuration: driver=usbhid maxpower=100mA
> > > > > > > > speed=1Mbit/s
> > > > > > > >
> > > > > > > > The bug is easily reproduced by unplugging the USB keyboard, waiting about a
> > > > > > > > couple of seconds,
> > > > > > > > and then reconnect and scan for memory leaks twice.
> > > > > > > >
> > > > > > > > The kmemleak log is as follows [edited privacy info]:
> > > > > > > >
> > > > > > > > root@hostname:/home/username# cat /sys/kernel/debug/kmemleak
> > > > > > > > unreferenced object 0xffff8dd020037c00 (size 96):
> > > > > > > >   comm "systemd-udevd", pid 435, jiffies 4294892550 (age 8909.356s)
> > > > > > > >   hex dump (first 32 bytes):
> > > > > > > >     5d 8e 4e b9 ff ff ff ff 00 00 00 00 00 00 00 00 ].N.............
> > > > > > > >     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > > > > > > >   backtrace:
> > > > > > > >     [<ffffffffb81a74be>] __kmem_cache_alloc_node+0x22e/0x2b0
> > > > > > > >     [<ffffffffb8127b6e>] kmalloc_trace+0x2e/0xa0
> > > > > > > >     [<ffffffffb87543d9>] class_create+0x29/0x80
> > > > > > > >     [<ffffffffb8880d24>] usb_register_dev+0x1d4/0x2e0
> > > > > > >
> > > > > > > As the call to class_create() in this path is now gone in 6.4-rc1, can
> > > > > > > you retry that release to see if this is still there or not?
> > > > > >
> > > > > > No, wait, it's still there, I was looking at a development branch of
> > > > > > mine that isn't sent upstream yet. And syzbot just reported the same
> > > > > > thing:
> > > > > > https://lore.kernel.org/r/00000000000058d15f05fb264013@xxxxxxxxxx
> > > > > >
> > > > > > So something's wrong here, let me dig into it tomorrow when I get a
> > > > > > chance...
> > > > >
> > > > > If this could help, here is the bisect of the bug (I could not discern what
> > > > > could possibly be wrong):
> > > > >
> > > > > user@host:~/linux/kernel/linux_torvalds$ git bisect log
> > > > > git bisect start
> > > > > # bad: [ac9a78681b921877518763ba0e89202254349d1b] Linux 6.4-rc1
> > > > > git bisect bad ac9a78681b921877518763ba0e89202254349d1b
> > > > > # good: [c9c3395d5e3dcc6daee66c6908354d47bf98cb0c] Linux 6.2
> > > > > git bisect good c9c3395d5e3dcc6daee66c6908354d47bf98cb0c
> > > > > # good: [85496c9b3bf8dbe15e2433d3a0197954d323cadc] Merge branch
> > > > > 'net-remove-some-rcu_bh-cruft'
> > > > > git bisect good 85496c9b3bf8dbe15e2433d3a0197954d323cadc
> > > > > # good: [b68ee1c6131c540a62ecd443be89c406401df091] Merge tag 'scsi-misc' of
> > > > > git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
> > > > > git bisect good b68ee1c6131c540a62ecd443be89c406401df091
> > > > > # bad: [888d3c9f7f3ae44101a3fd76528d3dd6f96e9fd0] Merge tag 'sysctl-6.4-rc1'
> > > > > of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
> > > > > git bisect bad 888d3c9f7f3ae44101a3fd76528d3dd6f96e9fd0
> > > > > # good: [34b62f186db9614e55d021f8c58d22fc44c57911] Merge tag
> > > > > 'pci-v6.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
> > > > > git bisect good 34b62f186db9614e55d021f8c58d22fc44c57911
> > > > > # good: [34da76dca4673ab1819830b4924bb5b436325b26] Merge tag
> > > > > 'for-linus-2023042601' of
> > > > > git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
> > > > > git bisect good 34da76dca4673ab1819830b4924bb5b436325b26
> > > > > # good: [97b2ff294381d05e59294a931c4db55276470cb5] Merge tag
> > > > > 'staging-6.4-rc1' of
> > > > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
> > > > > git bisect good 97b2ff294381d05e59294a931c4db55276470cb5
> > > > > # good: [2025b2ca8004c04861903d076c67a73a0ec6dfca] mcb-lpc: Reallocate
> > > > > memory region to avoid memory overlapping
> > > > > git bisect good 2025b2ca8004c04861903d076c67a73a0ec6dfca
> > > > > # bad: [d06f5a3f7140921ada47d49574ae6fa4de5e2a89] cdx: fix build failure due
> > > > > to sysfs 'bus_type' argument needing to be const
> > > > > git bisect bad d06f5a3f7140921ada47d49574ae6fa4de5e2a89
> > > > > # good: [dcfbb67e48a2becfce7990386e985b9c45098ee5] driver core: class: use
> > > > > lock_class_key already present in struct subsys_private
> > > > > git bisect good dcfbb67e48a2becfce7990386e985b9c45098ee5
> > > > > # bad: [6f14c02220c791d5c46b0f965b9340c58f3d503d] driver core: create
> > > > > class_is_registered()
> > > > > git bisect bad 6f14c02220c791d5c46b0f965b9340c58f3d503d
> > > > > # good: [2f9e87f5a2941b259336c7ea6c5a1499ede4554a] driver core: Add a
> > > > > comment to set_primary_fwnode() on nullifying
> > > > > git bisect good 2f9e87f5a2941b259336c7ea6c5a1499ede4554a
> > > > > # bad: [02fe26f25325b547b7a31a65deb0326c04bb5174] firmware_loader: Add debug
> > > > > message with checksum for FW file
> > > > > git bisect bad 02fe26f25325b547b7a31a65deb0326c04bb5174
> > > > > # good: [884f8ce42ccec9d0bf11d8bf9f111e5961ca1c82] driver core: class:
> > > > > implement class_get/put without the private pointer.
> > > > > git bisect good 884f8ce42ccec9d0bf11d8bf9f111e5961ca1c82
> > > > > # bad: [3f84aa5ec052dba960baca4ab8a352d43d47028e] base: soc: populate
> > > > > machine name in soc_device_register if empty
> > > > > git bisect bad 3f84aa5ec052dba960baca4ab8a352d43d47028e
> > > > > # bad: [7b884b7f24b42fa25e92ed724ad82f137610afaf] driver core: class.c:
> > > > > convert to only use class_to_subsys
> > > > > git bisect bad 7b884b7f24b42fa25e92ed724ad82f137610afaf
> > > > > # first bad commit: [7b884b7f24b42fa25e92ed724ad82f137610afaf] driver core:
> > > > > class.c: convert to only use class_to_subsys
> > > > > user@host:~/linux/kernel/linux_torvalds$
> > > >
> > > > This helps a lot, thanks. I got the reference counting wrong somewhere
> > > > in here, I thought I tested this better, odd it shows up now...
> > > >
> > > > I'll try to work on it this week.
> > >
> > > I have figured out that the leak occurs on keyboard unplugging only, one
> > > or two leaks (maybe a race condition?).
> > >
> > > Please NOTE that the number of leaks is now odd:
> > >
> > > root@defiant:/home/marvin# cat /sys/kernel/debug/kmemleak | grep comm
> > > comm "systemd-udevd", pid 330, jiffies 4294892588 (age 715.772s)
> > > comm "systemd-udevd", pid 330, jiffies 4294892588 (age 715.772s)
> > > comm "kworker/6:0", pid 54, jiffies 4294907989 (age 654.224s)
> > > comm "kworker/6:0", pid 54, jiffies 4294907989 (age 654.272s)
> > > comm "kworker/6:3", pid 3046, jiffies 4294935362 (age 544.780s)
> > > comm "kworker/6:0", pid 54, jiffies 4294964122 (age 429.740s)
> > > comm "kworker/6:0", pid 54, jiffies 4294964122 (age 429.784s)
> > > root@defiant:/home/marvin#
> > >
> > > At one time unplugging keyboard generated only one leak, but only at one
> > > time. As it requires manually unplugging keyboard, I didn't seem to find a
> > > way to automate it, but it doesn't seem to require root access.
> > >
> > > BTW, I've seen in syzbot output that kmemleak output has debug source file
> > > names and line numbers. I couldn't make that work with the dbg .deb.
> > >
> > > I will do some more homework, but this was a rough week.
> >
> > I made up a patch based on code inspection alone, as I couldn't
> > reproduce this locally at all:
> > https://lore.kernel.org/r/2023051628-thumb-boaster-5680@gregkh
> > and it seemed to pass syzbot's tests.
> >
> > I've included it here below, can you test it as well?
> >
> > Hm, I only tested with a USB mouse unplug/plug cycle, maybe the issue is
> > a keyboard?
> >
> > thanks,
> >
> > greg k-h
> >
> > -------------
> >
> > diff --git a/drivers/base/class.c b/drivers/base/class.c
> > index ac1808d1a2e8..9b44edc8416f 100644
> > --- a/drivers/base/class.c
> > +++ b/drivers/base/class.c
> > @@ -320,6 +322,7 @@ void class_dev_iter_init(struct class_dev_iter *iter, const struct class *class,
> > start_knode = &start->p->knode_class;
> > klist_iter_init_node(&sp->klist_devices, &iter->ki, start_knode);
> > iter->type = type;
> > + iter->sp = sp;
> > }
> > EXPORT_SYMBOL_GPL(class_dev_iter_init);
> > @@ -361,6 +364,7 @@ EXPORT_SYMBOL_GPL(class_dev_iter_next);
> > void class_dev_iter_exit(struct class_dev_iter *iter)
> > {
> > klist_iter_exit(&iter->ki);
> > + subsys_put(iter->sp);
> > }
> > EXPORT_SYMBOL_GPL(class_dev_iter_exit);
> > diff --git a/include/linux/device/class.h b/include/linux/device/class.h
> > index 9deeaeb457bb..abf3d3bfb6fe 100644
> > --- a/include/linux/device/class.h
> > +++ b/include/linux/device/class.h
> > @@ -74,6 +74,7 @@ struct class {
> > struct class_dev_iter {
> > struct klist_iter ki;
> > const struct device_type *type;
> > + struct subsys_private *sp;
> > };
> > int __must_check class_register(const struct class *class);
>
> The build with the latest 6.4-rc2 and without this patch still leaked,
> the build with the same commit and this patch applied was successful:
>
> root@defiant:/home/marvin# cat /sys/kernel/debug/kmemleak
> root@defiant:/home/marvin#
>
> Tried three times, and it is a OK.
>
> Congratulations! This had fixed the leak.

Wonderful, thanks for testing, can I add your "Tested-by:" to it?

> I wonder why it didn't show in the other contexts, hardware and archs?

It might depend on your keyboard if it has other things on it? I don't
know, sorry, I didn't spend much time digging after I found the "obvious
leak" based on the bisection you provided, which was very very helpful,
thanks for that.

And leaks are hard to notice, especially ones that only show up when you
remove a specific type of device.

thanks again for your help here,

greg k-h