Re: [PATCH v7 09/23] sched: Fix runtime accounting w/ split exec & sched contexts

From: Valentin Schneider
Date: Wed Jan 03 2024 - 08:48:14 EST


(I did a reply instead of a reply-all, sorry John you're getting this one twice!)

On 19/12/23 16:18, John Stultz wrote:
> The idea here is we want to charge the scheduler-context task's
> vruntime but charge the execution-context task's sum_exec_runtime.
>
> This way cputime accounting goes against the task actually running
> but vruntime accounting goes against the selected task so we get
> proper fairness.

This looks like the right approach, especially when it comes to exposing
data to userspace as with e.g. top.

I did however get curious as to what would be the impact of not updating
the donor's sum_exec_runtime. A quick look through fair.c shows these
function using it:
- numa_get_avg_runtime()
- task_numa_work()
- task_tick_numa()
- set_next_entity()
- hrtick_start_fair()

The NUMA ones shouldn't matter too much, as they care about the actually
running task, which is the one that gets its sum_exec_runtime increased.
task_tick_numa() needs to be changed though, as it should be passed the
currently running task, not the selected (donor) one, but shouldn't need
any other change (famous last words).

Generally I think all of the NUMA balancing stats stuff shouldn't care
about the donor task, as the pages being accessed are part of the execution
context.

The hrtick one is tricky. AFAICT since we don't update the donor's
sum_exec_runtime, in proxy scenarios we'll end up always programming the
hrtimer to the entire extent of the donor's slice, which might not be
correct. Considering the HRTICK SCHED_FEAT defaults to disabled, that could
be left as a TODO.