Re: [PATCH v5 7/7] sched: consider runnable load average in effective_load

From: Michael Wang
Date: Mon May 06 2013 - 06:28:06 EST


Hi, Preeti

On 05/06/2013 03:10 PM, Preeti U Murthy wrote:
> Hi Alex,Michael,
>
> Can you try out the below patch and check?

Sure, I will take a try also.

I have the reason mentioned in the changelog.
> If this also causes performance regression,you probably need to remove changes made in
> effective_load() as Michael points out. I believe the below patch should not cause
> performance regression.

Actually according to the current results of Alex's suggestion, I think
the issue already addressed, anyway, I will test this patch and reply
them at all, let's choose the best way later ;-)

Regards,
Michael Wang

>
> The below patch is a substitute for patch 7.
>
>
> -------------------------------------------------------------------------------
>
> sched: Modify effective_load() to use runnable load average
>
> From: Preeti U Murthy <preeti@xxxxxxxxxxxxxxxxxx>
>
> The runqueue weight distribution should update the runnable load average of
> the cfs_rq on which the task will be woken up.
>
> However since the computation of se->load.weight takes into consideration
> the runnable load average in update_cfs_shares(),no need to modify this in
> effective_load().
> ---
> kernel/sched/fair.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 790e23d..5489022 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3045,7 +3045,7 @@ static long effective_load(struct task_group *tg, int cpu, long wl, long wg)
> /*
> * w = rw_i + @wl
> */
> - w = se->my_q->load.weight + wl;
> + w = se->my_q->runnable_load_avg + wl;
>
> /*
> * wl = S * s'_i; see (2)
> @@ -3066,6 +3066,9 @@ static long effective_load(struct task_group *tg, int cpu, long wl, long wg)
> /*
> * wl = dw_i = S * (s'_i - s_i); see (3)
> */
> + /* Do not modify the below as it already contains runnable
> + * load average in its computation
> + */
> wl -= se->load.weight;
>
> /*
> @@ -3112,14 +3115,14 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
> */
> if (sync) {
> tg = task_group(current);
> - weight = current->se.load.weight;
> + weight = current->se.avg.load_avg_contrib;
>
> this_load += effective_load(tg, this_cpu, -weight, -weight);
> load += effective_load(tg, prev_cpu, 0, -weight);
> }
>
> tg = task_group(p);
> - weight = p->se.load.weight;
> + weight = p->se.avg.load_avg_contrib;
>
> /*
> * In low-load situations, where prev_cpu is idle and this_cpu is idle
>
>
> Regards
> Preeti U Murthy
>
> On 05/06/2013 09:04 AM, Michael Wang wrote:
>> Hi, Alex
>>
>> On 05/06/2013 09:45 AM, Alex Shi wrote:
>>> effective_load calculates the load change as seen from the
>>> root_task_group. It needs to engage the runnable average
>>> of changed task.
>> [snip]
>>> */
>>> @@ -3045,7 +3045,7 @@ static long effective_load(struct task_group *tg, int cpu, long wl, long wg)
>>> /*
>>> * w = rw_i + @wl
>>> */
>>> - w = se->my_q->load.weight + wl;
>>> + w = se->my_q->tg_load_contrib + wl;
>>
>> I've tested the patch set, seems like the last patch caused big
>> regression on pgbench:
>>
>> base patch 1~6 patch 1~7
>> | db_size | clients | tps | | tps | | tps |
>> +---------+---------+-------+ +-------+ +-------+
>> | 22 MB | 32 | 43420 | | 53387 | | 41625 |
>>
>> I guess some magic thing happened in effective_load() while calculating
>> group decay combined with load decay, what's your opinion?
>>
>> Regards,
>> Michael Wang
>>
>>>
>>> /*
>>> * wl = S * s'_i; see (2)
>>> @@ -3066,7 +3066,7 @@ static long effective_load(struct task_group *tg, int cpu, long wl, long wg)
>>> /*
>>> * wl = dw_i = S * (s'_i - s_i); see (3)
>>> */
>>> - wl -= se->load.weight;
>>> + wl -= se->avg.load_avg_contrib;
>>>
>>> /*
>>> * Recursively apply this logic to all parent groups to compute
>>> @@ -3112,14 +3112,14 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
>>> */
>>> if (sync) {
>>> tg = task_group(current);
>>> - weight = current->se.load.weight;
>>> + weight = current->se.avg.load_avg_contrib;
>>>
>>> this_load += effective_load(tg, this_cpu, -weight, -weight);
>>> load += effective_load(tg, prev_cpu, 0, -weight);
>>> }
>>>
>>> tg = task_group(p);
>>> - weight = p->se.load.weight;
>>> + weight = p->se.avg.load_avg_contrib;
>>>
>>> /*
>>> * In low-load situations, where prev_cpu is idle and this_cpu is idle
>>>
>>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/