Re: [RESEND PATCH v3 0/7] Improve scheduler scalability for fast path

From: Parth Shah
Date: Thu Jul 04 2019 - 07:36:13 EST


Hi,

On 7/3/19 9:22 AM, Subhra Mazumdar wrote:
>
> On 7/2/19 1:54 AM, Patrick Bellasi wrote:
>> Wondering if searching and preempting needs will ever be conflicting?
>> I guess the winning point is that we don't commit behaviors to
>> userspace, but just abstract concepts which are turned into biases.
>>
>> I don't see conflicts right now: if you are latency tolerant that
>> means you can spend more time to try finding a better CPU (e.g. we can
>> use the energy model to compare multiple CPUs) _and/or_ give the
>> current task a better chance to complete by delaying its preemption.
> OK
>>
>>> Otherwise sounds like a good direction to me. For the searching aspect, can
>>> we map latency nice values to the % of cores we search in select_idle_cpu?
>>> Thus the search cost can be controlled by latency nice value.
>> I guess that's worth a try, only caveat I see is that it's turning the
>> bias into something very platform specific. Meaning, the same
>> latency-nice value on different machines can have very different
>> results.
>>
>> Would not be better to try finding a more platform independent mapping?
>>
>> Maybe something time bounded, e.g. the higher the latency-nice the more
>> time we can spend looking for CPUs?
> The issue I see is suppose we have a range of latency-nice values, then it
> should cover the entire range of search (one core to all cores). As Peter
> said some workloads will want to search the LLC fully. If we have absolute
> time, the map of latency-nice values range to them will be arbitrary. If
> you have something in mind let me know, may be I am thinking differently.
>>
>>> But the issue is if more latency tolerant workloads set to less
>>> search, we still need some mechanism to achieve good spread of
>>> threads.
>> I don't get this example: why more latency tolerant workloads should
>> require less search?
> I guess I got the definition of "latency tolerant" backwards.
>>
>>> Can we keep the sliding window mechanism in that case?
>> Which one? Sorry did not went through the patches, can you briefly
>> resume the idea?
> If a workload has set it to low latency tolerant, then the search will be
> less. That can lead to localization of threads on a few CPUs as we are not
> searching the entire LLC even if there are idle CPUs available. For this
> I had introduced a per-CPU variable (for the target CPU) to track the
> boundary of search so that every time it will start from the boundary, thus
> sliding the window. So even if we are searching very little the search
> window keeps shifting and gives us a good spread. This is orthogonal to the
> latency-nice thing.

Can it be done something like turning off searching for an idle core if the wakee
task is latency tolerant(more latency-nice)? We search for idle core to get faster
resource allocation, thus such tasks don't need to find idle core and can
directly jump to finding idle CPUs.
This can include sliding windows mechanism along, but as I commented previously, it
imposes task ping-pong problem as sliding window gets away from target_cpu. So maybe
we can first search the core with target_cpu and if no idle CPUs found then bail
to this sliding window mechanism.
Just an thought.

Best,
Parth


>>
>>> Also will latency nice do anything for select_idle_core and
>>> select_idle_smt?
>> I guess principle the same bias can be used at different levels, maybe
>> with different mappings.
> Doing it for select_idle_core will have the issue that the dynamic flag
> (whether an idle core is present or not) can only be updated by threads
> which are doing the full search.
>
> Thanks,
> Subhra
>
>> In the mobile world use-case we will likely use it only to switch from
>> select_idle_sibling to the energy aware slow path. And perhaps to see
>> if we can bias the wakeup preemption granularity.
>>
>> Best,
>> Patrick
>>
>