Re: [PATCH v2 1/3] sched/uclamp: Set max_spare_cap_cpu even if max_spare_cap is 0

From: Lukasz Luba
Date: Mon Jun 05 2023 - 09:07:37 EST




On 5/31/23 19:22, Qais Yousef wrote:
Hi Lukasz!

Sorry for late response..

On 05/22/23 09:30, Lukasz Luba wrote:
Hi Qais,

I have a question regarding the 'soft cpu affinity'.

[...]

IIUC I'm not seeing this being a problem. The goal of capping with uclamp_max
is two folds:

1. Prevent tasks from consuming energy.
2. Keep them away from expensive CPUs.

2 is actually very important for 2 reasons:

a. Because of max aggregation - any uncapped tasks that wakes up will
cause a frequency spike on this 'expensive' cpu. We don't have
a mechanism to downmigrate it - which is another thing I'm working
on.
b. It is desired to keep these bigger cpu idle ready for more important
work.

For 2, generally we don't want these tasks to steal bandwidth from these CPUs
that we'd like to preserve for other type of work.

I'm a bit afraid about such 'strong force'. That means the task would
not go via EAS if we set uclamp_max e.g. 90, while the little capacity
is 125. Or am I missing something?

We should go via EAS, actually that's the whole point.

Why do you think we won't go via EAS? The logic should be is we give a hint to
prefer the little core, but we still can pick something else if it's more
energy efficient.

What uclamp_max enables us is to still consider that little core even if it's
utilization says it doesn't fit there. We need to merge these patches first
though as it's broken at the moment. if little capacity is 125 and utilization
of the task is 125, then even if uclamp_max is 0, EAS will skip the whole
little cluster as apotential candidate because there's no spare_capacity there.
Even if the whole little cluster is idle.

OK, I see now - it's a bug then.



This might effectively use more energy for those tasks which can run on
any CPU and EAS would figure a good energy placement. I'm worried
about this, since we have L3+littles in one DVFS domain and the L3
would be only bigger in future.

It's a bias that will enable the search algorithm in EAS to still consider the
little core for big tasks. This bias will depend on the uclamp_max value chosen
by userspace (so they have some control on how hard to cap the task), and what
else is happening in the system at the time it wakes up.

OK, so we would go via EAS and check the energy impact in 3 PDs - which
is desired.



IMO to keep the big cpus more in idle, we should give them big energy
wake up cost. That's my 3rd feature to the EM presented in OSPM2023.

Considering the wake up cost in EAS would be a great addition to have :)



Of course userspace has control by selecting the right uclamp_max value. They
can increase it to allow a spill to next pd - or keep it low to steer them more
strongly on a specific pd.

This would we be interesting to see in practice. I think we need such
experiment, for such changes.

I'm not sure what you mean by 'such changes'. I hope you don't mean these
patches as they are not the key. They fix an obvious bug where task placement
hint won't work at all. They don't modify any behavior that shouldn't have
already been there. Nor introduce new limitation. I have to say I am
disappointed that these patches aren't considered an important fix for an
obvious breakage.

I mean, in practice - in our pixel6 test 3-gear :)

Thank for explanation.