Re: [PATCH] sched/psi: Optimise psi_group_change a bit

From: Johannes Weiner
Date: Fri Mar 29 2024 - 14:52:02 EST


On Fri, Mar 29, 2024 at 04:06:48PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tursulin@xxxxxxxxxxx>
>
> The current code loops over the psi_states only to call a helper which
> then resolves back to the action needed for each state using a switch
> statement. That is effectively creating a double indirection of a kind
> which, given how all the states need to be explicitly listed and handled
> anyway, we can simply remove. Both the for loop and the switch statement
> that is.
>
> The benefit is both in the code size and CPU time spent in this function.
> YMMV but on my Steam Deck, while in a game, the patch makes the CPU usage
> go from ~2.4% down to ~1.2%. Text size at the same time went from 0x323 to
> 0x2c1.
>
> Signed-off-by: Tvrtko Ursulin <tursulin@xxxxxxxxxxx>
> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
> Cc: Suren Baghdasaryan <surenb@xxxxxxxxxx>
> Cc: Peter Ziljstra <peterz@xxxxxxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Cc: kernel-dev@xxxxxxxxxx

This is great.

Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>

Ingo, would you mind please taking this through the scheduler tree? I
think Peter is still out.

Remaining quote below.

Thanks

> ---
> kernel/sched/psi.c | 54 +++++++++++++++++++++++-----------------------
> 1 file changed, 27 insertions(+), 27 deletions(-)
>
> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
> index 7b4aa5809c0f..55720ecf420e 100644
> --- a/kernel/sched/psi.c
> +++ b/kernel/sched/psi.c
> @@ -218,28 +218,32 @@ void __init psi_init(void)
> group_init(&psi_system);
> }
>
> -static bool test_state(unsigned int *tasks, enum psi_states state, bool oncpu)
> +static u32 test_states(unsigned int *tasks, u32 state_mask)
> {
> - switch (state) {
> - case PSI_IO_SOME:
> - return unlikely(tasks[NR_IOWAIT]);
> - case PSI_IO_FULL:
> - return unlikely(tasks[NR_IOWAIT] && !tasks[NR_RUNNING]);
> - case PSI_MEM_SOME:
> - return unlikely(tasks[NR_MEMSTALL]);
> - case PSI_MEM_FULL:
> - return unlikely(tasks[NR_MEMSTALL] &&
> - tasks[NR_RUNNING] == tasks[NR_MEMSTALL_RUNNING]);
> - case PSI_CPU_SOME:
> - return unlikely(tasks[NR_RUNNING] > oncpu);
> - case PSI_CPU_FULL:
> - return unlikely(tasks[NR_RUNNING] && !oncpu);
> - case PSI_NONIDLE:
> - return tasks[NR_IOWAIT] || tasks[NR_MEMSTALL] ||
> - tasks[NR_RUNNING];
> - default:
> - return false;
> + const bool oncpu = state_mask & PSI_ONCPU;
> +
> + if (tasks[NR_IOWAIT]) {
> + state_mask |= BIT(PSI_IO_SOME);
> + if (!tasks[NR_RUNNING])
> + state_mask |= BIT(PSI_IO_FULL);
> }
> +
> + if (tasks[NR_MEMSTALL]) {
> + state_mask |= BIT(PSI_MEM_SOME);
> + if (tasks[NR_RUNNING] == tasks[NR_MEMSTALL_RUNNING])
> + state_mask |= BIT(PSI_MEM_FULL);
> + }
> +
> + if (tasks[NR_RUNNING] > oncpu)
> + state_mask |= BIT(PSI_CPU_SOME);
> +
> + if (tasks[NR_RUNNING] && !oncpu)
> + state_mask |= BIT(PSI_CPU_FULL);
> +
> + if (tasks[NR_IOWAIT] || tasks[NR_MEMSTALL] || tasks[NR_RUNNING])
> + state_mask |= BIT(PSI_NONIDLE);
> +
> + return state_mask;
> }
>
> static void get_recent_times(struct psi_group *group, int cpu,
> @@ -770,7 +774,6 @@ static void psi_group_change(struct psi_group *group, int cpu,
> {
> struct psi_group_cpu *groupc;
> unsigned int t, m;
> - enum psi_states s;
> u32 state_mask;
>
> groupc = per_cpu_ptr(group->pcpu, cpu);
> @@ -841,10 +844,7 @@ static void psi_group_change(struct psi_group *group, int cpu,
> return;
> }
>
> - for (s = 0; s < NR_PSI_STATES; s++) {
> - if (test_state(groupc->tasks, s, state_mask & PSI_ONCPU))
> - state_mask |= (1 << s);
> - }
> + state_mask = test_states(groupc->tasks, state_mask);
>
> /*
> * Since we care about lost potential, a memstall is FULL
> @@ -1194,7 +1194,7 @@ void psi_cgroup_restart(struct psi_group *group)
> /*
> * After we disable psi_group->enabled, we don't actually
> * stop percpu tasks accounting in each psi_group_cpu,
> - * instead only stop test_state() loop, record_times()
> + * instead only stop test_states() loop, record_times()
> * and averaging worker, see psi_group_change() for details.
> *
> * When disable cgroup PSI, this function has nothing to sync
> @@ -1202,7 +1202,7 @@ void psi_cgroup_restart(struct psi_group *group)
> * would see !psi_group->enabled and only do task accounting.
> *
> * When re-enable cgroup PSI, this function use psi_group_change()
> - * to get correct state mask from test_state() loop on tasks[],
> + * to get correct state mask from test_states() loop on tasks[],
> * and restart groupc->state_start from now, use .clear = .set = 0
> * here since no task status really changed.
> */
> --
> 2.44.0
>