Re: [RFC PATCH 54/86] sched: add cond_resched_stall()

From: Ankur Arora
Date: Thu Nov 09 2023 - 17:29:23 EST



Thomas Gleixner <tglx@xxxxxxxxxxxxx> writes:

> On Tue, Nov 07 2023 at 13:57, Ankur Arora wrote:
>> The kernel has a lot of intances of cond_resched() where it is used
>> as an alternative to spinning in a tight-loop while waiting to
>> retry an operation, or while waiting for a device state to change.
>>
>> Unfortunately, because the scheduler is unlikely to have an
>> interminable supply of runnable tasks on the runqueue, this just
>> amounts to spinning in a tight-loop with a cond_resched().
>> (When running in a fully preemptible kernel, cond_resched()
>> calls are stubbed out so it amounts to even less.)
>>
>> In sum, cond_resched() in error handling/retry contexts might
>> be useful in avoiding softlockup splats, but not very good at
>> error handling. Ideally, these should be replaced with some kind
>> of timed or event wait.
>>
>> For now add cond_resched_stall(), which tries to schedule if
>> possible, and failing that executes a cpu_relax().
>
> What's the point of this new variant of cond_resched()? We really do not
> want it at all.
>
>> +int __cond_resched_stall(void)
>> +{
>> + if (tif_need_resched(RESCHED_eager)) {
>> + __preempt_schedule();
>
> Under the new model TIF_NEED_RESCHED is going to reschedule if the
> preemption counter goes to zero.

Yes agreed. cond_resched_stall() was just meant to be window dressing.

> So the typical
>
> while (readl(mmio) & BUSY)
> cpu_relax();
>
> will just be preempted like any other loop, no?

Yeah. But drivers could be using that right now as well. I suspect people
don't like the idea of spinning in a loop and, that's why they use
cond_resched(). Which in loops like this, is pretty much:

while (readl(mmio) & BUSY)
;

The reason I added cond_resched_stall() was as an analogue to
cond_resched_lock() etc. Here, explicitly giving up CPU.

Though, someone pointed out a much better interface to do that sort
of thing: readb_poll_timeout(). Not all but a fair number of sites
could be converted to that.

Ankur