Re: [PATCH] kernel/sched: Update schedstats when migrating threads

From: Carlos Bilbao
Date: Wed Feb 23 2022 - 11:00:03 EST


On 2/23/2022 9:47 AM, Phil Auld wrote:
> On Wed, Feb 23, 2022 at 09:33:59AM -0600 Carlos Bilbao wrote:
>> On 2/23/2022 9:28 AM, Phil Auld wrote:
>>> On Wed, Feb 23, 2022 at 09:14:45AM -0600 Carlos Bilbao wrote:
>>>> On 2/23/2022 4:19 AM, Peter Zijlstra wrote:
>>>>> On Wed, Jan 26, 2022 at 09:22:23AM -0600, Carlos Bilbao wrote:
>>>>>> The kernel manages per-task scheduler statistics or schedstats. Such
>>>>>> counters should be reinitialized when the thread is migrated to a
>>>>>> different core rq, except for the values recording number of migrations.
>>>>>
>>>>> I'm confused, why should we reset schedstats on migrate? I'm thinking
>>>>> this breaks per-task, since tasks tend to bounce around quite a lot.
>>>>>
>>>>
>>>> Thanks for your comments, Peter.
>>>>
>>>> Looking at the documentation of schedstats I see that most values are
>>>> actually linked to the particular CPU: time spent on the cpu, timeslices
>>>> run on this cpu, number of times _something_ was called when the cpu was
>>>> idle, and so forth. Those values lose their meaning after migration and we
>>>> should reinitialize their counters. However, reviewing sched_statistics I
>>>> identify two fields that we should definitely keep increasing even after
>>>> migration (nr_migrations_cold, nr_forced_migrations).
>>>>
>>>
>>> The documentation is a little off. I think it should say "any cpu" instead
>>> of "this cpu". If you reset these per task counters (time on cpu, number
>>> of timeslices etc) on every migration then they become meaningless (and
>>> useless).
>>>
>>>
>>> Cheers,
>>> Phil
>>>
>>
>> Well that clarifies it! Then, let me ask the opposite question... What
>> fields of schedstats should we clear when migrating? If there isn't any,
>> I will just increase the number of migrations.
>>
>
> I don't think any should be cleared on migration. They're per task and
> should be monotically increasing. If they ever reset it becomes hard to
> know what they mean when you read them.
>
>
> Cheers,
> Phil
>
>

Agreed, I'll send a PATCH v2 only increasing the migration counter.

>>>> So this patch will have to be upgraded if there's some other value(s) in
>>>> schedstats that we do not want to reinitialize either.
>>>>
>>>>>> Signed-off-by: Carlos Bilbao <carlos.bilbao@xxxxxxx>
>>>>>> ---
>>>>>> kernel/sched/core.c | 10 +++++++++-
>>>>>> 1 file changed, 9 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>>>>> index fe53e510e711..d64c2a290176 100644
>>>>>> --- a/kernel/sched/core.c
>>>>>> +++ b/kernel/sched/core.c
>>>>>> @@ -8757,6 +8757,7 @@ bool sched_smp_initialized __read_mostly;
>>>>>> int migrate_task_to(struct task_struct *p, int target_cpu)
>>>>>> {
>>>>>> struct migration_arg arg = { p, target_cpu };
>>>>>> + uint64_t forced_migrations, migrations_cold;
>>>>>> int curr_cpu = task_cpu(p);
>>>>>>
>>>>>> if (curr_cpu == target_cpu)
>>>>>> @@ -8765,7 +8766,14 @@ int migrate_task_to(struct task_struct *p, int target_cpu)
>>>>>> if (!cpumask_test_cpu(target_cpu, p->cpus_ptr))
>>>>>> return -EINVAL;
>>>>>>
>>>>>> - /* TODO: This is not properly updating schedstats */
>>>>>> + if (schedstat_enabled()) {
>>>>>> + forced_migrations = schedstat_val(p->stats.nr_forced_migrations);
>>>>>> + migrations_cold = schedstat_val(p->stats.nr_migrations_cold);
>>>>>> + memset(&p->stats, 0, sizeof(p->stats));
>>>>>> + schedstat_set(p->stats.nr_forced_migrations, forced_migrations);
>>>>>> + schedstat_set(p->stats.nr_migrations_cold, migrations_cold);
>>>>>> + schedstat_inc(p->stats.nr_migrations_cold);
>>>>>> + }
>>>>>>
>>>>>> trace_sched_move_numa(p, curr_cpu, target_cpu);
>>>>>> return stop_one_cpu(curr_cpu, migration_cpu_stop, &arg);
>>>>>> --
>>>>>> 2.27.0
>>>>>>
>>>>
>>>> Thanks,
>>>> Carlos
>>>>
>>>
>>
>> Thanks,
>> Carlos
>>
>