Re: [PATCH v8 0/4] sched: Don't trigger misfit if affinity is restricted

From: Qais Yousef
Date: Thu Mar 28 2024 - 22:10:01 EST


+Lukasz

On 03/24/24 00:45, Qais Yousef wrote:
> There was a discussion on handling hotplug operation removing a capacity level
> and lead to unnecessary misfit lb to trigger again. I opted not to handle it
> now, but a working patch is available in [1]. I don't feel strongly about it
> and would leave it up to the maintainers to push which direction they prefer.
> Patch 4 will make sure that balance interval and nr_failed won't grow
> unnecessarily due to bad unnecessary misfit lb. It will lead to some
> sub-optimality, but no incorrect behavior.
>
> After 6.9 merge window, dynamic Energy Model series would be merged and it can
> lead to the capacities of the CPUs being changed at runtime. This means I need
> to post follow up patch to handle this situation to ensure max_allowed_capacity
> is correct after an EM update. It might make then handling of hotplug operation
> attractive too as there would be some common shared ground.

I was trying to work on this follow up patch now tip has moved to 6.9-rc1, but
I can't see how the new dynamic EM logic will trigger an update to
asym_cap_list. Did I miss something? Will/should init_cpu_capacity_callback()
be triggered after the update?

How will scheduler know the new max capacities are different? Or did
I misunderstand the new EM runtime logic and it won't lead to having a new
arch_scale_cpu_capacity() values?


Thanks!

--
Qais Yousef

>
> [1] https://lore.kernel.org/lkml/20240321122039.7gk2mc3syvkrvhjz@airbuntu/
>
> Changes since v7:
>
> * Remove sd arg from check_misfit_status()
> * Update typo in commit message in patch 2.
> * Add Reviewed-by from Vincent
>
> Changes since v6:
>
> * Simplify update_misfit_status
>
> Changes since v5:
>
> * Remove redundant check to rq->rd->max_cpu_capacity
> * Simplify check_misfit_status() further by removing unnecessary checks.
> * Add new patch to remove no longer used rd->max_cpu_capacity
> * Add new patch to prevent misfit lb from polluting balance_interval
> and nr_balance_failed
>
> Changes since v4:
>
> * Store max_allowed_capacity in task_struct and populate it when
> affinity changes to avoid iterating through the capacities list in the
> fast path (Vincent)
> * Use rq->rd->max_cpu_capacity which is updated after hotplug
> operations to check biggest allowed capacity in the system.
> * Undo the change to check_misfit_status() and improve the function to
> avoid similar confusion in the future.
> * Split the patches differently. Export the capacity list and sort it
> is now patch 1, handling of affinity for misfit detection is patch 2.
>
> Changes since v3:
>
> * Update commit message of patch 2 to be less verbose
>
> Changes since v2:
>
> * Convert access of asym_cap_list to be rcu protected
> * Add new patch to sort the list in descending order
> * Move some declarations inside affinity check block
> * Remove now redundant check against max_cpu_capacity in check_misfit_status()
>
> Changes since v1:
>
> * Use asym_cap_list (thanks Dietmar) to iterate instead of iterating
> through every cpu which Vincent was concerned about.
> * Use uclamped util to compare with capacity instead of util_fits_cpu()
> when iterating through capcities (Dietmar).
> * Update commit log with test results to better demonstrate the problem
>
> v1 discussion: https://lore.kernel.org/lkml/20230820203429.568884-1-qyousef@xxxxxxxxxxx/
> v2 discussion: https://lore.kernel.org/lkml/20231212154056.626978-1-qyousef@xxxxxxxxxxx/
> v3 discussion: https://lore.kernel.org/lkml/20231231175218.510721-1-qyousef@xxxxxxxxxxx/
> v4 discussion: https://lore.kernel.org/lkml/20240105222014.1025040-1-qyousef@xxxxxxxxxxx/
> v5 discussion: https://lore.kernel.org/lkml/20240205021123.2225933-1-qyousef@xxxxxxxxxxx/
> v6, v7 discussion: https://lore.kernel.org/lkml/20240220225622.2626569-1-qyousef@xxxxxxxxxxx/
>
> Thanks!
>
> --
> Qais Yousef
>
> Qais Yousef (4):
> sched/topology: Export asym_capacity_list
> sched/fair: Check a task has a fitting cpu when updating misfit
> sched/topology: Remove max_cpu_capacity from root_domain
> sched/fair: Don't double balance_interval for migrate_misfit
>
> include/linux/sched.h | 1 +
> init/init_task.c | 1 +
> kernel/sched/fair.c | 79 +++++++++++++++++++++++++++++++----------
> kernel/sched/sched.h | 16 +++++++--
> kernel/sched/topology.c | 56 ++++++++++++++---------------
> 5 files changed, 104 insertions(+), 49 deletions(-)
>
> --
> 2.34.1
>