Re: [PATCH 1/3] sched: better handling for busy polling loops

From: Josh Don
Date: Tue Oct 27 2020 - 21:40:11 EST


On Fri, Oct 23, 2020 at 12:19 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Thu, Oct 22, 2020 at 08:29:42PM -0700, Josh Don wrote:
> > Busy polling loops in the kernel such as network socket poll and kvm
> > halt polling have performance problems related to process scheduler load
> > accounting.
>
> AFAICT you're not actually fixing the load accounting issue at all.

Halt polling is an ephemeral state, and the goal is not to adjust the
accounting, but rather to allow wake up balance to identify polling
cpus.

> > This change also disables preemption for the duration of the busy
> > polling loop. This is important, as it ensures that if a polling thread
> > decides to end its poll to relinquish cpu to another thread, the polling
> > thread will actually exit the busy loop and potentially block. When it
> > later becomes runnable, it will have the opportunity to find an idle cpu
> > via wakeup cpu selection.
>
> At the cost of inducing a sleep+wake cycle; which is mucho expensive. So
> this could go either way. No data presented.

I can take a look at getting some data on that. What I do have is data
that reflects an overall improvement in machine efficiency. On heavily
loaded hosts, we were able to recoup ~2% of cycles.

> > +void prepare_to_busy_poll(void)
> > +{
> > + struct rq __maybe_unused *rq = this_rq();
> > + unsigned long __maybe_unused flags;
> > +
> > + /* Preemption will be reenabled by end_busy_poll() */
> > + preempt_disable();
> > +
> > +#ifdef CONFIG_SMP
> > + raw_spin_lock_irqsave(&rq->lock, flags);
> > + /* preemption disabled; only one thread can poll at a time */
> > + WARN_ON_ONCE(rq->busy_polling);
> > + rq->busy_polling++;
> > + raw_spin_unlock_irqrestore(&rq->lock, flags);
> > +#endif
>
> Explain to me the purpose of that rq->lock usage.

This was required in a previous iteration, but failed to drop after a refactor.

> > +int continue_busy_poll(void)
> > +{
> > + if (!single_task_running())
> > + return 0;
>
> Why? If there's more, we'll end up in the below condition anyway.
>

Migrating a task over won't necessarily set need_resched. Though it
does make sense to allow check_preempt_curr to set it directly in this
case.

> > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > index 28709f6b0975..45de468d0ffb 100644
> > --- a/kernel/sched/sched.h
> > +++ b/kernel/sched/sched.h
> > @@ -1003,6 +1003,8 @@ struct rq {
> >
> > /* This is used to determine avg_idle's max value */
> > u64 max_idle_balance_cost;
> > +
> > + unsigned int busy_polling;
>
> This is a good location, cache-wise, because?
>
> > #endif /* CONFIG_SMP */
> >
> > #ifdef CONFIG_IRQ_TIME_ACCOUNTING

Good call out, I didn't consider that.