Re: [GIT pull] timers/urgent for v5.16-rc1

From: Thomas Gleixner
Date: Sun Nov 14 2021 - 14:24:42 EST


On Sun, Nov 14 2021 at 11:02, Linus Torvalds wrote:
> On Sun, Nov 14, 2021 at 5:31 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>>
>> + /*
>> + * A copied work entry from the old task is not meaningful, clear it.
>> + * N.B. init_task_work will not do this.
>> + */
>> + memset(&p->posix_cputimers_work.work, 0,
>> + sizeof(p->posix_cputimers_work.work));
>> + init_task_work(&p->posix_cputimers_work.work,
>> + posix_cpu_timers_work);
>
> Ugh.
>
> Instead of the added four lines of comment, and two lines of
> "memset()", maybe this should just have made init_task_work() DTRT?
>
> Yes,. I see this:
>
> /* Protect against double add, see task_tick_numa and task_numa_work */
> p->numa_work.next = &p->numa_work;
> ...
> init_task_work(&p->numa_work, task_numa_work);
>
> but I think that one is so subtle and such a special case that it
> should have been updated - just make that magic special flag happen
> after the init_task_work.
>
> A lot of the other cases seem to zero-initialize things elsewhere
> (generally with kzalloc()), but I note that at least
> io_ring_exit_work() seems to have this:
>
> struct io_tctx_exit exit;
> ...
> init_task_work(&exit.task_work, io_tctx_exit_cb);
>
> and the ->next pointer is never set to NULL.
>
> Now, in 99% of all cases the ->next pointer simply doesn't matter,
> because task_work_add() will only set it, not caring about the old
> value.
>
> But apparently it matters for posix_cputimers_work and for numa_work,
> and so I think it's very illogical that init_task_work() will not
> actually initialize it properly.
>
> Hmm?
>
> I've pulled this, but it really looks like the wrong solution to the
> whole "uninitialized data".
>
> And that task_tick_numa() special case is truly horrendous, and really
> should go after the init_task_work() regardless, exactly because you'd
> expect that init_task_work() to initialize the work even if it doesn't
> happen to right now.
>
> Or is somebody doing init_task_work() to only change the work-function
> on an already initialized work entry? Becuase that sounds both racy
> and broken to me, and none of the things I looked at from a quick grep
> looked like that at all.

I'll have a deeper look at that tomorrow.

Thanks,

tglx