Re: [PATCH 1/1] sched/cputime: do not decrease steal time after live migration on xen

From: Dongli Zhang
Date: Wed Oct 11 2017 - 03:48:38 EST


Hi Rik,

On 10/10/2017 10:01 PM, Rik van Riel wrote:
> On Tue, 2017-10-10 at 14:48 +0200, Peter Zijlstra wrote:
>> On Tue, Oct 10, 2017 at 02:42:01PM +0200, Stanislaw Gruszka wrote:
>>>>> + u64 steal, steal_time;
>>>>> + s64 steal_delta;
>>>>> +
>>>>> + steal_time =
>>>>> paravirt_steal_clock(smp_processor_id());
>>>>> + steal = steal_delta = steal_time - this_rq()-
>>>>>> prev_steal_time;
>>>>> +
>>>>> + if (unlikely(steal_delta < 0)) {
>>>>> + this_rq()->prev_steal_time =
>>>>> steal_time;
>>>
>>> I don't think setting prev_steal_time to smaller value is right
>>> thing to do.
>>>
>>> Beside, I don't think we need to check for overflow condition for
>>> cputime variables (it will happen after 279 years :-). So instead
>>> of introducing signed steal_delta variable I would just add
>>> below check, which should be sufficient to fix the problem:
>>>
>>> if (unlikely(steal <= this_rq()->prev_steal_time))
>>> return 0;
>>
>> How about you just fix up paravirt_steal_time() on migration and not
>> muck with the users ?
>
> Not just migration, either. CPU hotplug is another time to fix up
> the steal time.

I think this issue might be hit when we add and online vcpu after a very very
long time since boot (or the last time vcpu is offline). Please correct me if I
am wrong.

Thank you very much!

Dongli Zhang

>