Re: [PATCH 1/1] sched/fair: allow disabling newidle_balance with sched_relax_domain_level

From: Vitalii Bursov
Date: Thu Mar 28 2024 - 13:11:41 EST




On 28.03.24 18:48, Vincent Guittot wrote:
> On Thu, 28 Mar 2024 at 17:27, Vitalii Bursov <vitaly@xxxxxxxxxx> wrote:
>>
>>
>> On 28.03.24 16:43, Vincent Guittot wrote:
>>> On Thu, 28 Mar 2024 at 01:31, Vitalii Bursov <vitaly@xxxxxxxxxx> wrote:
>>>>
>>>> Change relax_domain_level checks so that it would be possible
>>>> to exclude all domains from newidle balancing.
>>>>
>>>> This matches the behavior described in the documentation:
>>>> -1 no request. use system default or follow request of others.
>>>> 0 no search.
>>>> 1 search siblings (hyperthreads in a core).
>>>>
>>>> "2" enables levels 0 and 1, level_max excludes the last (level_max)
>>>> level, and level_max+1 includes all levels.
>>>
>>> I was about to say that max+1 is useless because it's the same as -1
>>> but it's not exactly the same because it can supersede the system wide
>>> default_relax_domain_level. I wonder if one should be able to enable
>>> more levels than what the system has set by default.
>>
>> I don't know is such systems exist, but cpusets.rst suggests that
>> increasing it beyoud the default value is possible:
>>> If your situation is:
>>>
>>> - The migration costs between each cpu can be assumed considerably
>>> small(for you) due to your special application's behavior or
>>> special hardware support for CPU cache etc.
>>> - The searching cost doesn't have impact(for you) or you can make
>>> the searching cost enough small by managing cpuset to compact etc.
>>> - The latency is required even it sacrifices cache hit rate etc.
>>> then increasing 'sched_relax_domain_level' would benefit you.
>
> Fair enough. The doc should be updated as we can now clear the flags
> but not set them
>

SD_BALANCE_NEWIDLE is always set by default in sd_init() and cleared
in set_domain_attribute() depending on default_relax_domain_level
("relax_domain_level" kernel parameter) and cgroup configuration
if it's present.

So, it should work both ways - clearing flags when relax level
is decreasing, and not clearing the flag when it's increasing,
isn't it?

Also, after a closer look at set_domain_attribute(), it looks like
default_relax_domain_level is -1 on all systems, so if cgroup does
not set relax level, it won't clear any flags, which probably means
that level_max+1 is redundant today.