Re: [PATCH v6 19/21] timer: Implement the hierarchical pull model

From: Anna-Maria Behnsen
Date: Mon May 15 2023 - 06:50:36 EST


On Mon, 15 May 2023, Sebastian Siewior wrote:

> On 2023-05-10 12:32:53 [+0200], Frederic Weisbecker wrote:
> > Le Wed, May 10, 2023 at 09:28:15AM +0200, Anna-Maria Behnsen a écrit :
> > > +static u64 tmigr_handle_remote_cpu(unsigned int cpu, u64 now,
> > > + unsigned long jif)
> > > +{
> > > + struct timer_events tevt;
> > > + struct tmigr_walk data;
> > > + struct tmigr_cpu *tmc;
> > > + u64 next = KTIME_MAX;
> > > +
> > > + tmc = per_cpu_ptr(&tmigr_cpu, cpu);
> > > +
> > > + raw_spin_lock_irq(&tmc->lock);
> > > + /*
> > > + * Remote CPU is offline or no longer idle or other cpu handles cpu
> > > + * timers already or next event was already expired - return!
> > > + */
> > > + if (!tmc->online || tmc->remote || tmc->cpuevt.ignore ||
> > > + now < tmc->cpuevt.nextevt.expires) {
> > > + raw_spin_unlock_irq(&tmc->lock);
> > > + return next;
> > > + }
> > > +
> > > + tmc->remote = 1;
> > > +
> > > + /* Drop the lock to allow the remote CPU to exit idle */
> > > + raw_spin_unlock_irq(&tmc->lock);
> > > +
> > > + if (cpu != smp_processor_id())
> > > + timer_expire_remote(cpu);
> > > +
> > > + /*
> > > + * Pretend that there is no timer pending if the cpu is offline.
> > > + * Possible pending timers will be migrated later to an active cpu.
> > > + */
> > > + if (cpu_is_offline(smp_processor_id())) {
> > > + raw_spin_lock_irq(&tmc->lock);
> > > + tevt.local = tevt.global = KTIME_MAX;
> > > + } else {
> > > + /*
> > > + * Lock ordering needs to be preserved - timer_base locks
> > > + * before tmigr related locks. During fetching the next
> > > + * timer interrupt, also tmc->lock needs to be
> > > + * held. Otherwise there is a possible race window against
> > > + * the CPU itself when it comes out of idle, updates the
> > > + * first timer and goes back to idle.
> > > + */
> > > + timer_lock_remote_bases(cpu);
> >
> > So the return value is ignored here.
> >
> > In the case of !PREEMPT_RT, I suppose it's impossible for the target
> > CPU to be offline. You checked above tmc->online and in-between the
> > call to timer_lock_remote_bases(), the path is BH-disabled, this prevents
> > stop_machine from running and from setting the CPU as offline.
>
> I think you refer to the last one invoked from takedown_cpu(). This does
> not matter, see below.
>
> What bothers me is that _current_ CPU is check for cpu_is_offline() and
> not the variable 'cpu'. Before the check timer_expire_remote() is
> invoked on 'cpu' and not on current.

I already saw this, when I revistied the place where frederic pointed
to. So in v7 this will be fixed. See snippet below...

> > However in PREEMPT_RT, ksoftirqd (or timersd) is preemptible, so it seems
> > that it could happen in theory. And that could create a locking imbalance.
>
> The ksoftirqd thread is part of smpboot_park_threads(). They have to
> stop running and clean up before the machinery continues bringing down
> the CPU (that is before takedown_cpu()). On the way down we have:
> - tmigr_cpu_offline() followed by
> - smpboot_park_threads().
>
> So ksoftirqd (preempted or not) finishes before. This is for the
> _target_ CPU.
>
> After the "tmc->online" check the lock is dropped and this is invoked
> from run_timer_softirq(). That means that _this_ CPU could get preempted
> (by an IRQ for instance) at this point, and once the CPU gets back here,
> the remote CPU (as specified in `cpu') can already be offline by the
> time timer_lock_remote_bases() is invoked.
>
> So RT or not, this is racy.
>
> > My suggestion would be to unconditionally lock the bases, you already checked if
> > !tmc->online before. The remote CPU may have gone down since then because the
> > tmc lock has been relaxed but it should be rare enough that you don't care
> > about optimizing with a lockless check. So you can just lock the bases,
> > lock the tmc and check again if tmc->online. If not then you can just ignore
> > the tmigr_new_timer_up call and propagation.
>
> Regardless the previous point, this still looks odd as you pointed out.
> The return code is ignored and the two functions perform lock + unlock
> depending on it.

The part will be replaced by:

---8<----

local_irq_disable();
timer_lock_remote_bases(cpu);
raw_spin_lock(&tmc->lock);

if (!tmc->online || !tmc->idle) {
timer_unlock_remote_bases(cpu);
goto unlock;
} else {
fetch_next_timer_interrupt_remote(jif, now, &tevt, cpu);
}

timer_unlock_remote_bases(cpu);

-> do the preparation and the walk

unlock:
raw_spin_unlock_irq(&tmc->lock);

---8<----

As menitoned in the reply last week to Frederics objections regarding the
locking asymmetry, I would like to keep it to make the locking region of
timer base locks as small as possible and to prevent holding up to five
locks during the walk.

Thanks,

Anna-Maria