Re: [PATCH/RFC] debugobjects/slub: Print slab info and backtrace.

From: Thomas Gleixner
Date: Sun Nov 05 2023 - 19:13:36 EST


On Sun, Nov 05 2023 at 09:40, Ben Greear wrote:
> On 11/5/23 8:20 AM, Thomas Gleixner wrote:
>>> 16147 Nov 02 17:28:25 ct523c-2103 kernel: worker_thread+0x38a/0x680
>>
>> Can you please provide proper kernel dmesg output next time instead of
>> this mess?
>
> You are complaining because there are a few extra tokens put in this
> by journalctl?

Superfluous information is just distracting :)

>>> ODEBUG: free active (active state 0) object: ffff888181c029a0 object type: timer_list hint: kobject_delayed_cleanup+0x0/0x140
>>> WARNING: CPU: 1 PID: 104 at lib/debugobjects.c:549 debug_print_object+0xf0/0x170
>>> CPU: 1 PID: 104 Comm: kworker/1:10 Tainted: G W 6.6.0-rc7+ #17
>>> Workqueue: events kobject_delayed_cleanup
>>> RIP: 0010:debug_print_object+0xf0/0x170
>>> debug_check_no_obj_freed+0x261/0x2b0
>>> __kmem_cache_free+0x185/0x200
>>> device_release+0x57/0x100
>>> kobject_delayed_cleanup+0xdf/0x140
>>> process_one_work+0x475/0x920
>>> worker_thread+0x38a/0x680
>>
>> So what happens is:
>>
>> pps_unregister_cdev()
>> device_destroy()
>> put_device()
>> device_unregister()
>> device_del()
>> put_device() <- Drops final reference to dev->kobj
>> schedule_delayed_work()
>>
>> worker thread:
>> kobject_delayed_cleanup()
>> device_release()
>> pps_device_destruct()
>> cdev_del(&pps->cdev)
>> kobject_put(&cdev->kobj) <- Drops final reference
>> schedule_delayed_work()
>> init_timer(&cdev->kobj.release.timer);
>> start_timer();
>> ...
>> kfree(dev);
>> kfree(pps); <- Debug object detects the active timer to be freed
>> because cdev and its kobject are embedded in
>> struct pps_device.
>>
>> pps_device_destruct() is unfortunately not on the call trace of the
>> debug objects splat anymore stack because kfree(pps) is a tail call.
>
> So, is this a real bug, or just false positive?

Freeing an active timer is obviously a bug, no?

>> So yes, that collected stacktrace is helpful.
>
> The one I added, or was the original code enough to find this?

The one you added. Collecting this information is useful when the object
tracked by debugobjects provides a hint which does not give any
information about the source of trouble:

timer_list hint: kobject_delayed_cleanup+0x0/0x140

It points to the work function, but that function is useless to figure
out where the kobject belongs to. So the extra stack trace during init
provides the missing information.

Though I just looked at the patch again. This part is problematic:

> + trace_handle = stack_depot_save(entries, nr_entries, GFP_NOWAIT);

This breaks on RT as you cannot allocate with a raw spinlock held.

Let met think about that.

Thanks,

tglx