Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

From: Vincent Guittot
Date: Tue Sep 08 2015 - 10:07:04 EST


On 8 September 2015 at 14:52, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Tue, Sep 08, 2015 at 02:26:06PM +0200, Peter Zijlstra wrote:
>> On Tue, Sep 08, 2015 at 09:22:05AM +0200, Vincent Guittot wrote:
>> > No, but
>> > sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
>> > will fix the unit issue.
>>
>> Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.
>>
>> And where load_sum already gets a factor 1024 from the weight
>> multiplication, util_sum does not get such a factor, and all the scaling
>> we do on it loose bits.
>>
>> So at the moment we go compute the util_avg value, we need to inflate
>> util_sum with an extra factor 1024 in order to make it work.
>>
>> And seeing that we do the shift up on sa->util_sum without consideration
>> of overflow, would it not make sense to add that factor before the
>> scaling and into the addition?
>>
>> Now, given all that, units are a complete mess here, and I'd not mind
>> something like:
>>
>> #if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != SCHED_CAPACITY_SHIFT
>> #error "something usefull"
>> #endif
>>
>> somewhere near here.
>
> Something like teh below..
>
> Another thing to ponder; the downside of scaled_delta_w is that its
> fairly likely delta is small and you loose all bits, whereas the weight
> is likely to be large can could loose a fwe bits without issue.
>
> That is, in fixed point scaling like this, you want to start with the
> biggest numbers, not the smallest, otherwise you loose too much.
>
> The flip side is of course that now you can share a multiplcation.
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -682,7 +682,7 @@ void init_entity_runnable_average(struct
> sa->load_avg = scale_load_down(se->load.weight);
> sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
> sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
> - sa->util_sum = LOAD_AVG_MAX;
> + sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
> /* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */
> }
>
> @@ -2515,6 +2515,10 @@ static u32 __compute_runnable_contrib(u6
> return contrib + runnable_avg_yN_sum[n];
> }
>
> +#if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != 10 || SCHED_CAPACITY_SHIFT != 10
> +#error "load tracking assumes 2^10 as unit"
> +#endif

so why don't we set SCHED_CAPACITY_SHIFT to SCHED_LOAD_SHIFT ?

> +
> #define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
>
> /*
> @@ -2599,7 +2603,7 @@ __update_load_avg(u64 now, int cpu, stru
> }
> }
> if (running)
> - sa->util_sum += cap_scale(scaled_delta_w, scale_cpu);
> + sa->util_sum += scaled_delta_w * scale_cpu;
>
> delta -= delta_w;
>
> @@ -2623,7 +2627,7 @@ __update_load_avg(u64 now, int cpu, stru
> cfs_rq->runnable_load_sum += weight * contrib;
> }
> if (running)
> - sa->util_sum += cap_scale(contrib, scale_cpu);
> + sa->util_sum += contrib * scale_cpu;
> }
>
> /* Remainder of delta accrued against u_0` */
> @@ -2634,7 +2638,7 @@ __update_load_avg(u64 now, int cpu, stru
> cfs_rq->runnable_load_sum += weight * scaled_delta;
> }
> if (running)
> - sa->util_sum += cap_scale(scaled_delta, scale_cpu);
> + sa->util_sum += scaled_delta * scale_cpu;
>
> sa->period_contrib += delta;
>
> @@ -2644,7 +2648,7 @@ __update_load_avg(u64 now, int cpu, stru
> cfs_rq->runnable_load_avg =
> div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX);
> }
> - sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
> + sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
> }
>
> return decayed;
> @@ -2686,8 +2690,7 @@ static inline int update_cfs_rq_load_avg
> if (atomic_long_read(&cfs_rq->removed_util_avg)) {
> long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0);
> sa->util_avg = max_t(long, sa->util_avg - r, 0);
> - sa->util_sum = max_t(s32, sa->util_sum -
> - ((r * LOAD_AVG_MAX) >> SCHED_LOAD_SHIFT), 0);
> + sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0);

looks good to me

> }
>
> decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/