Re: [PATCH 3/3] sched/fair: fix unnecessary increase of balance interval

From: Valentin Schneider
Date: Fri Dec 21 2018 - 12:15:36 EST


On 21/12/2018 14:49, Vincent Guittot wrote:
[...]
> After looking at shed.c at this sha1, (sd->nr_balance_failed >
> sd->cache_nice_tries+2) was the only condition for doing active
> migration and as a result it was the only reason for doubling
> sd->balance_interval.
> My patch keeps exactly the same behavior for this condition
> 'sd->nr_balance_failed > sd->cache_nice_tries+2). And, I'm even more
> convinced to exclude (sd->nr_balance_failed > sd->cache_nice_tries+2)
> condition because it's the condition that has introduced the doubling
> of the interval.
>
> As said previously, you can argue that this behavior is not optimal
> and discuss its validity, but the sha1 that you mentioned above,
> introduced the current policy for (sd->nr_balance_failed >
> sd->cache_nice_tries+2) condition.
> Reverting such behavior would need more studies, tests and cares

I agree with you on that, those are valid concerns.

What I'm arguing for is instead of doing this in two steps (reset interval
only for some active balance types, then experiment only increasing it for
"active balance as a last resort"), I'd prefer doing it in one step.

Mostly because I think the intermediate step adds an active balance
categorization that can easily become confusing. Furthermore, that 2005
commit explicitly states it wants to cater to pinned tasks, but we didn't
have those LBF_* flags back then, so if we are to do something about it
we should be improving upon the original intent.

In the end it's not for me to decide, I just happen to find doing it that
way more elegant (from a functionality & code readability PoV).

> which
> are out of the scope of this patchset and more related to a whole
> refactoring of load_balance and calculte_imbalance; FYI, I have
> submitted a topic on the subject for the next OSPM