Re: [PATCH v4 07/13] sched: Split scheduler execution context

From: John Stultz
Date: Tue Jun 13 2023 - 16:38:12 EST


On Tue, Jun 13, 2023 at 10:12 AM Dietmar Eggemann
<dietmar.eggemann@xxxxxxx> wrote:
> On 01/06/2023 07:58, John Stultz wrote:
> > NOTE: Peter previously mentioned he didn't like the name
> > "rq_selected()", but I've not come up with a better alternative.
> > I'm very open to other name proposals.
> >
> > Question for Peter: Dietmar suggested you'd prefer I drop the
> > conditionalization of the scheduler context pointer on the rq
> > (so rq_selected() would be open coded as rq->curr_sched or
> > whatever we agree on for a name), but I'd think in the
> > !CONFIG_PROXY_EXEC case we'd want to avoid the wasted pointer
> > and its use (since it curr_sched would always be == curr)?
> > If I'm wrong I'm fine switching this, but would appreciate
> > clarification.
>
> IMHO, keeping both, rq->curr and rq->proxy (latter for rq->curr_sched)
> would make it easier to navigate through the different versions of this
> patch-set while reviewing.
>
> I do understand that you have issues with the function name proxy() not
> returning the proxy (task blocked on a mutex) but the mutex owner instead.
>
> The header of v3 'sched: Add proxy execution'
> https://lkml.kernel.org/r/20230411042511.1606592-12-jstultz@xxxxxxxxxx
> mentions:
>
> " ... Potential proxies (i.e., tasks blocked on a mutex) are not
> dequeued, so, if one of them is actually selected by schedule() as the
> next task to be put to run on a CPU, proxy() is used to walk the
> blocked_on relation and find which task (mutex owner) might be able to
> use the proxy's scheduling context. ..."
>
> But as I can see now, you changed this patch header in v4 to explain the
> PE model slightly differently.

Yeah. (As you know from offline discussions :) I do feel a bit
strongly that using the term proxy for the scheduler context is
unnecessarily confusing and requires some mental contortions to try to
make it fit the metaphor being used.

In my mind, the task chosen by pick_next_task() is what we want to
run, but if it is blocked on a mutex, we let the mutex owner run on
its behalf, with the "authority" (ie: scheduling context) of the
originally chosen task.

This is a direct parallel to proxy voting where a person who needs to
vote cannot attend, so someone else is sent to vote on their behalf,
and does so with the authority of the person who cannot attend.

So, much like the person who votes on behalf of another is the proxy,
with proxy-execution it makes most sense that the task that runs on
the selected task's behalf is the proxy.

Calling the selected scheduler context the proxy makes it very
difficult to use the metaphor to help in understanding what is being
done. I'll grant you can try to twist it around and view it so that
the blocked tasks are sort of proxy-voters left on the runqueue and
sent to the pick_next_task() function to vote on behalf of a mutex
owner, but that would be more like "proxy-scheduling". And it breaks
down further as the blocked tasks actually don't know who they are
voting for until after they are selected and we run proxy() to walk
the blocked_on chain. It just doesn't fit the metaphor very well
(maybe "puppet candidates " would be better in this model?) and I
think it only adds confusion.

This logic is already subtle and complex enough - we don't need to add
stroop effects[1] to keep it interesting. :)

But I agree the historic usage of the term proxy in the patch series
makes it hard to simply switch it around, which is why I tried instead
to reduce the use of the term proxy where it didn't seem appropriate,
replacing it with "selected" or "donor".

Again, I'm happy to switch to other terms that make sense.

thanks
-john

[1] https://en.wikipedia.org/wiki/Stroop_effect