Re: [PATCH/RFC] debugobjects/slub: Print slab info and backtrace.

From: Ben Greear
Date: Sun Nov 05 2023 - 12:42:23 EST


On 11/5/23 8:20 AM, Thomas Gleixner wrote:
On Thu, Nov 02 2023 at 18:49, Ben Greear wrote:
And here is resulting splat from wireless-next tree I've been
debugging.

Note the subsequent splats from slub are due to some memory poisoning, for
one reason or another. Maybe slub changes should not be included in this patch, not
sure if it can provide useful info in other cases though.

If I understand this correctly, then it appears the bug is related to
the pps driver.

16140 Nov 02 17:28:25 ct523c-2103 kernel: ODEBUG: debugobjects: debug_obj allocated at:
16141 Nov 02 17:28:25 ct523c-2103 kernel: init_timer_key+0x24/0x160
16142 Nov 02 17:28:25 ct523c-2103 kernel: kobject_put+0x14f/0x190
16143 Nov 02 17:28:25 ct523c-2103 kernel: pps_device_destruct+0x26/0xb0
16144 Nov 02 17:28:25 ct523c-2103 kernel: device_release+0x57/0x100
16145 Nov 02 17:28:25 ct523c-2103 kernel: kobject_delayed_cleanup+0xdf/0x140
16146 Nov 02 17:28:25 ct523c-2103 kernel: process_one_work+0x475/0x920
16147 Nov 02 17:28:25 ct523c-2103 kernel: worker_thread+0x38a/0x680

Can you please provide proper kernel dmesg output next time instead of
this mess?

You are complaining because there are a few extra tokens put in this
by journalctl?


ODEBUG: free active (active state 0) object: ffff888181c029a0 object type: timer_list hint: kobject_delayed_cleanup+0x0/0x140
WARNING: CPU: 1 PID: 104 at lib/debugobjects.c:549 debug_print_object+0xf0/0x170
CPU: 1 PID: 104 Comm: kworker/1:10 Tainted: G W 6.6.0-rc7+ #17
Workqueue: events kobject_delayed_cleanup
RIP: 0010:debug_print_object+0xf0/0x170
debug_check_no_obj_freed+0x261/0x2b0
__kmem_cache_free+0x185/0x200
device_release+0x57/0x100
kobject_delayed_cleanup+0xdf/0x140
process_one_work+0x475/0x920
worker_thread+0x38a/0x680

So what happens is:

pps_unregister_cdev()
device_destroy()
put_device()
device_unregister()
device_del()
put_device() <- Drops final reference to dev->kobj
schedule_delayed_work()

worker thread:
kobject_delayed_cleanup()
device_release()
pps_device_destruct()
cdev_del(&pps->cdev)
kobject_put(&cdev->kobj) <- Drops final reference
schedule_delayed_work()
init_timer(&cdev->kobj.release.timer);
start_timer();
...
kfree(dev);
kfree(pps); <- Debug object detects the active timer to be freed
because cdev and its kobject are embedded in
struct pps_device.

pps_device_destruct() is unfortunately not on the call trace of the
debug objects splat anymore stack because kfree(pps) is a tail call.

So, is this a real bug, or just false positive?


So yes, that collected stacktrace is helpful.

The one I added, or was the original code enough to find this?

I don't really understand the debugobjects code well, never heard of pps driver
before I started looking into this. I hacked in the backtrace code by copying from
existing code in the kernel.

If you are happy with current debug-objects and pps related warning splat is understood,
happy to just drop my patch. Once I hacked around another bug in wifi, my kernel
has been stable, so whatever pps is doing, doesn't seem to be actively harmful
on my system.

Thanks,
Ben

--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc http://www.candelatech.com