Re: [PATCH/RFC] debugobjects/slub: Print slab info and backtrace.

From: Thomas Gleixner
Date: Sun Nov 05 2023 - 11:21:09 EST


On Thu, Nov 02 2023 at 18:49, Ben Greear wrote:
> And here is resulting splat from wireless-next tree I've been
> debugging.
>
> Note the subsequent splats from slub are due to some memory poisoning, for
> one reason or another. Maybe slub changes should not be included in this patch, not
> sure if it can provide useful info in other cases though.
>
> If I understand this correctly, then it appears the bug is related to
> the pps driver.
>
> 16140 Nov 02 17:28:25 ct523c-2103 kernel: ODEBUG: debugobjects: debug_obj allocated at:
> 16141 Nov 02 17:28:25 ct523c-2103 kernel: init_timer_key+0x24/0x160
> 16142 Nov 02 17:28:25 ct523c-2103 kernel: kobject_put+0x14f/0x190
> 16143 Nov 02 17:28:25 ct523c-2103 kernel: pps_device_destruct+0x26/0xb0
> 16144 Nov 02 17:28:25 ct523c-2103 kernel: device_release+0x57/0x100
> 16145 Nov 02 17:28:25 ct523c-2103 kernel: kobject_delayed_cleanup+0xdf/0x140
> 16146 Nov 02 17:28:25 ct523c-2103 kernel: process_one_work+0x475/0x920
> 16147 Nov 02 17:28:25 ct523c-2103 kernel: worker_thread+0x38a/0x680

Can you please provide proper kernel dmesg output next time instead of
this mess?

> ODEBUG: free active (active state 0) object: ffff888181c029a0 object type: timer_list hint: kobject_delayed_cleanup+0x0/0x140
> WARNING: CPU: 1 PID: 104 at lib/debugobjects.c:549 debug_print_object+0xf0/0x170
> CPU: 1 PID: 104 Comm: kworker/1:10 Tainted: G W 6.6.0-rc7+ #17
> Workqueue: events kobject_delayed_cleanup
> RIP: 0010:debug_print_object+0xf0/0x170
> debug_check_no_obj_freed+0x261/0x2b0
> __kmem_cache_free+0x185/0x200
> device_release+0x57/0x100
> kobject_delayed_cleanup+0xdf/0x140
> process_one_work+0x475/0x920
> worker_thread+0x38a/0x680

So what happens is:

pps_unregister_cdev()
device_destroy()
put_device()
device_unregister()
device_del()
put_device() <- Drops final reference to dev->kobj
schedule_delayed_work()

worker thread:
kobject_delayed_cleanup()
device_release()
pps_device_destruct()
cdev_del(&pps->cdev)
kobject_put(&cdev->kobj) <- Drops final reference
schedule_delayed_work()
init_timer(&cdev->kobj.release.timer);
start_timer();
...
kfree(dev);
kfree(pps); <- Debug object detects the active timer to be freed
because cdev and its kobject are embedded in
struct pps_device.

pps_device_destruct() is unfortunately not on the call trace of the
debug objects splat anymore stack because kfree(pps) is a tail call.

So yes, that collected stacktrace is helpful.

>> To try to improve this, store the backtrace of where the
>> debug_obj was created and print that out when problems
>> are found.
<SNIP>

Please trim your replies.

Thanks,

tglx