Re: [PATCH v5] kthread: Add debugobject support

From: qianli zhao
Date: Fri Oct 09 2020 - 03:26:34 EST


Hi,Thomas

Thanks for your reply


On Thu, 1 Oct 2020 at 22:34, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> On Mon, Aug 17 2020 at 14:37, Qianli Zhao wrote:
> > From: Qianli Zhao <zhaoqianli@xxxxxxxxxx>
> >
> > Add debugobject support to track the life time of kthread_work
> > which is used to detect reinitialization/free active object problems
> > Add kthread_init_work_onstack()/kthread_init_delayed_work_onstack() for
> > kthread onstack support
> >
> > If we reinitialize a kthread_work that has been activated,
> > this will cause delayed_work_list/work_list corruption.
> > enable this config,there is an chance to fixup these errors
> > or WARNING the wrong use of kthread_work
> >
> > [30858.395766] list_del corruption. next->prev should be ffffffe388ebbf88, but was ffffffe388ebb588
> > [30858.395788] WARNING: CPU: 2 PID: 387 at kernel/msm-4.19/lib/list_debug.c:56 __list_del_entry_valid+0xc8/0xd0
> > ...
> > [30858.395906] Call trace:
> > [30858.395909] __list_del_entry_valid+0xc8/0xd0
> > [30858.395912] __kthread_cancel_work_sync+0x98/0x138
> > [30858.395915] kthread_cancel_delayed_work_sync+0x10/0x20
> > [30858.395917] sde_encoder_resource_control+0xe8/0x12c0
> > [30858.395920] sde_encoder_prepare_for_kickoff+0x5dc/0x2008
> > [30858.395923] sde_crtc_commit_kickoff+0x280/0x890
> > [30858.395925] sde_kms_commit+0x16c/0x278
> > [30858.395928] complete_commit+0x3c4/0x760
> > [30858.395931] _msm_drm_commit_work_cb+0xec/0x1e0
> > [30858.395933] kthread_worker_fn+0xf8/0x190
> > [30858.395935] kthread+0x118/0x128
> > [30858.395938] ret_from_fork+0x10/0x18
> >
> > crash> struct kthread_worker.delayed_work_list 0xffffffe3893925f0
> > [ffffffe389392620] delayed_work_list = {
> > next = 0xffffffe388ebbf88,
> > prev = 0xffffffe388ebb588
> > }
> > crash> list 0xffffffe388ebbf88
> > ffffffe388ebbf88
>
> This changelog is confusing at best. Something like this perhaps?
>
> kthread_work is not covered by debug objects, but the same problems as with
> regular work objects apply.
>
> Some of the issues like reinitialization of an active kthread_work are hard
> to debug because the problem manifests itself later in a completely
> different context.
>
> Add debugobject support along with the necessary fixup functions to make
> debugging of these problems less tedious.
>

Will follow your tips to improve.

> > +static void stub_kthread_work(struct kthread_work *unuse)
>
> unused?
>
> > +{
> > + WARN_ON(1);
> > +}

When the kthread_work is not initialized, kwork_fixup_assert_init()
will call kthread_init_work() to fixup this kthread_work,and
kthread_init_work() needs a function as a parameter,so we have to
quote an empty function(refer to stub_timer()).

> > void kthread_flush_work(struct kthread_work *work)
> > {
> > struct kthread_flush_work fwork = {
> > - KTHREAD_WORK_INIT(fwork.work, kthread_flush_work_fn),
> > - COMPLETION_INITIALIZER_ONSTACK(fwork.done),
> > + .done = COMPLETION_INITIALIZER_ONSTACK(fwork.done),
>
> Eew. Why is the completion initialized seperately instead of being
> initialized as part of kthread_init_work_onstack() ?
>

I just try to keep the same as before,how about change as below?
completion initialized as part of kthread_init_work_onstack()
@@ -1319,10 +1318,9 @@ bool kthread_cancel_delayed_work_sync(struct
kthread_delayed_work *dwork)
*/
void kthread_flush_worker(struct kthread_worker *worker)
{
- struct kthread_flush_work fwork = {
- .done = COMPLETION_INITIALIZER_ONSTACK(fwork.done),
- };
+ struct kthread_flush_work fwork;

+ fwork.done = COMPLETION_INITIALIZER_ONSTACK(fwork.done);
kthread_init_work_onstack(&fwork.work, kthread_flush_work_fn);

> > };
> > struct kthread_worker *worker;
> > bool noop = false;
> >
> > + debug_kwork_assert_init(work);
> > worker = work->worker;
> > if (!worker)
> > return;
> >
> > + kthread_init_work_onstack(&fwork.work, kthread_flush_work_fn);
> >
> > @@ -1194,12 +1319,13 @@ EXPORT_SYMBOL_GPL(kthread_cancel_delayed_work_sync);
> > void kthread_flush_worker(struct kthread_worker *worker)
> > {
> > struct kthread_flush_work fwork = {
> > - KTHREAD_WORK_INIT(fwork.work, kthread_flush_work_fn),
> > - COMPLETION_INITIALIZER_ONSTACK(fwork.done),
> > + .done = COMPLETION_INITIALIZER_ONSTACK(fwork.done),
> > };
>
> Ditto.
>
> > + kthread_init_work_onstack(&fwork.work, kthread_flush_work_fn);
> > kthread_queue_work(worker, &fwork.work);
> > wait_for_completion(&fwork.done);
> > + destroy_kwork_on_stack(&fwork.work);
>
> Thanks,
>
> tglx