Re: [patch 01/20] posix-timers: Prevent RT livelock in itimer_delete()

From: Thomas Gleixner
Date: Fri May 05 2023 - 03:57:26 EST


On Thu, May 04 2023 at 20:20, Thomas Gleixner wrote:
> On Thu, May 04 2023 at 19:06, Frederic Weisbecker wrote:
>> Le Tue, Apr 25, 2023 at 08:48:56PM +0200, Thomas Gleixner a écrit :
>>> itimer_delete() has a retry loop when the timer is concurrently expired. On
>>> non-RT kernels this just spin-waits until the timer callback has
>>> completed. On RT kernels this is a potential livelock when the exiting task
>>> preempted the hrtimer soft interrupt.
>>>
>>> This only affects hrtimer based timers as Posix CPU timers cannot be
>>> concurrently expired. For CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y this is
>>> obviously impossible as the task cannot run task work and exit at the same
>>> time. The CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n (only non-RT) is prevented
>>> because interrupts are disabled.
>>
>> But the owner of the timer is not the same as the target of the timer, right?
>>
>> Though I seem to remember that we forbid setting a timer to a target outside
>> the current process, in which case the owner and the target are the same at
>> this exit stage. But I can't remember what enforces that permission in
>> pid_for_clock()..
>
> The owner of the timer is always the one which needs to find the entity
> to synchronize on, whether that's the right hrtimer base or the task
> which runs the expiry code.
>
> wait_for_running_timer() is taking that into account:
>
> - The hrtimer timer based posix timers lock the hrtimer base expiry
> lock on the base to which the timer is currently associated
>
> - Posix CPU timers can be armed on a differnet process (only per
> thread timers are restricted to currents threadgroup) but the
> wait_for_running() callback "knows" how to find that process:
>
> When the timer is moved to the expiry list it gets:
>
> cputimer->firing = 1;
> rcu_assign_pointer(ctmr->handling, current);
>
> and the wait for running side does:
>
> rcu_read_lock()
> tsk = rcu_dereference(timr->it.cpu.handling);
> ....
> mutex_lock(&tsk->posix_cputimers_work.mutex);
>
> See collect_timerqueue(), handle_posix_cpu_timers() and
> posix_cpu_timer_wait_running() for details.
>
> commit f7abf14f0001 ("posix-cpu-timers: Implement the missing
> timer_wait_running callback") has quite some prose in the changelog.

But you have a point. The comment I added in itimer_delete() vs. CPU
timers is wrong for timers which are armed on a different process.
Needs to be removed.

Thanks,

tglx