Re: [PATCH v6 19/21] timer: Implement the hierarchical pull model

From: Frederic Weisbecker
Date: Wed May 10 2023 - 06:33:07 EST


Le Wed, May 10, 2023 at 09:28:15AM +0200, Anna-Maria Behnsen a écrit :
> +static u64 tmigr_handle_remote_cpu(unsigned int cpu, u64 now,
> + unsigned long jif)
> +{
> + struct timer_events tevt;
> + struct tmigr_walk data;
> + struct tmigr_cpu *tmc;
> + u64 next = KTIME_MAX;
> +
> + tmc = per_cpu_ptr(&tmigr_cpu, cpu);
> +
> + raw_spin_lock_irq(&tmc->lock);
> + /*
> + * Remote CPU is offline or no longer idle or other cpu handles cpu
> + * timers already or next event was already expired - return!
> + */
> + if (!tmc->online || tmc->remote || tmc->cpuevt.ignore ||
> + now < tmc->cpuevt.nextevt.expires) {
> + raw_spin_unlock_irq(&tmc->lock);
> + return next;
> + }
> +
> + tmc->remote = 1;
> +
> + /* Drop the lock to allow the remote CPU to exit idle */
> + raw_spin_unlock_irq(&tmc->lock);
> +
> + if (cpu != smp_processor_id())
> + timer_expire_remote(cpu);
> +
> + /*
> + * Pretend that there is no timer pending if the cpu is offline.
> + * Possible pending timers will be migrated later to an active cpu.
> + */
> + if (cpu_is_offline(smp_processor_id())) {
> + raw_spin_lock_irq(&tmc->lock);
> + tevt.local = tevt.global = KTIME_MAX;
> + } else {
> + /*
> + * Lock ordering needs to be preserved - timer_base locks
> + * before tmigr related locks. During fetching the next
> + * timer interrupt, also tmc->lock needs to be
> + * held. Otherwise there is a possible race window against
> + * the CPU itself when it comes out of idle, updates the
> + * first timer and goes back to idle.
> + */
> + timer_lock_remote_bases(cpu);

So the return value is ignored here.

In the case of !PREEMPT_RT, I suppose it's impossible for the target
CPU to be offline. You checked above tmc->online and in-between the
call to timer_lock_remote_bases(), the path is BH-disabled, this prevents
stop_machine from running and from setting the CPU as offline.

However in PREEMPT_RT, ksoftirqd (or timersd) is preemptible, so it seems
that it could happen in theory. And that could create a locking imbalance.

My suggestion would be to unconditionally lock the bases, you already checked if
!tmc->online before. The remote CPU may have gone down since then because the
tmc lock has been relaxed but it should be rare enough that you don't care
about optimizing with a lockless check. So you can just lock the bases,
lock the tmc and check again if tmc->online. If not then you can just ignore
the tmigr_new_timer_up call and propagation.

Thanks.

> + raw_spin_lock(&tmc->lock);
> +
> + /* next event of cpu */
> + fetch_next_timer_interrupt_remote(jif, now, &tevt, cpu);
> + }
> +
> + /*
> + * Nothing more to do when CPU came out of idle in the meantime - needs
> + * to be checked when holding the tmc lock to prevent race.
> + */
> + if (!tmc->idle)
> + goto unlock;
> +
> + data.evt = &tmc->cpuevt;
> + data.nextexp = tevt.global;
> + data.groupstate.state = atomic_read(&tmc->tmgroup->migr_state);
> + data.remote = true;
> + tmc->cpuevt.ignore = 0;
> +
> + walk_groups(&tmigr_new_timer_up, &data, tmc);
> +
> + next = data.nextexp;
> +
> +unlock:
> + tmc->remote = 0;
> + raw_spin_unlock_irq(&tmc->lock);
> +
> + return next;
> +}