Re: [RFC PATCH 66/86] treewide: kernel: remove cond_resched()

From: Luis Chamberlain
Date: Fri Nov 17 2023 - 13:14:49 EST


On Tue, Nov 07, 2023 at 03:08:02PM -0800, Ankur Arora wrote:
> There are broadly three sets of uses of cond_resched():
>
> 1. Calls to cond_resched() out of the goodness of our heart,
> otherwise known as avoiding lockup splats.
>
> 2. Open coded variants of cond_resched_lock() which call
> cond_resched().
>
> 3. Retry or error handling loops, where cond_resched() is used as a
> quick alternative to spinning in a tight-loop.
>
> When running under a full preemption model, the cond_resched() reduces
> to a NOP (not even a barrier) so removing it obviously cannot matter.
>
> But considering only voluntary preemption models (for say code that
> has been mostly tested under those), for set-1 and set-2 the
> scheduler can now preempt kernel tasks running beyond their time
> quanta anywhere they are preemptible() [1]. Which removes any need
> for these explicitly placed scheduling points.
>
> The cond_resched() calls in set-3 are a little more difficult.
> To start with, given it's NOP character under full preemption, it
> never actually saved us from a tight loop.
> With voluntary preemption, it's not a NOP, but it might as well be --
> for most workloads the scheduler does not have an interminable supply
> of runnable tasks on the runqueue.
>
> So, cond_resched() is useful to not get softlockup splats, but not
> terribly good for error handling. Ideally, these should be replaced
> with some kind of timed or event wait.
> For now we use cond_resched_stall(), which tries to schedule if
> possible, and executes a cpu_relax() if not.
>
> All of these are from set-1 except for the retry loops in
> task_function_call() or the mutex testing logic.
>
> Replace these with cond_resched_stall(). The others can be removed.
>
> [1] https://lore.kernel.org/lkml/20231107215742.363031-1-ankur.a.arora@xxxxxxxxxx/
>
> Cc: Tejun Heo <tj@xxxxxxxxxx>
> Cc: Zefan Li <lizefan.x@xxxxxxxxxxxxx>
> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
> Cc: Peter Oberparleiter <oberpar@xxxxxxxxxxxxx>
> Cc: Eric Biederman <ebiederm@xxxxxxxxxxxx>
> Cc: Will Deacon <will@xxxxxxxxxx>
> Cc: Luis Chamberlain <mcgrof@xxxxxxxxxx>
> Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
> Cc: Juri Lelli <juri.lelli@xxxxxxxxxx>
> Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> Signed-off-by: Ankur Arora <ankur.a.arora@xxxxxxxxxx>

Sounds like the sort of test which should be put into linux-next to get
test coverage right away. To see what really blows up.

Luis