Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage

From: Balbir Singh
Date: Tue Oct 16 2007 - 09:00:19 EST


Christian Borntraeger wrote:
> Am Dienstag, 16. Oktober 2007 schrieb Balbir Singh:
>> I am trying to think out loud as to what the root cause of the problem
>> might be. In one of the discussion threads, I saw utime going backwards,
>> which seemed very odd, I suspect that those are rounding errors.
>>
>> I don't understand your explanation below
>>
>> Initially utime = 9, stime = 0, sum_exec_runtime = S1
>>
>> Later
>>
>> utime = 9, stime = 1, sum_exec_runtime = S2
>>
>> We can be sure that S >= (utime + stime)
>
> I think here is the problem. How can we be sure? We cant. utime and stime
> are sampled, so they can be largely off in any direction,if the program
> sleeps often and manages to synchronize itself to the timer tick. Lets say
> a program only does a simple system call and then sleeps. So sum_exec_runtime
> is increased by lets say 1000 cycles on a 1Ghz box which means 1000ns. If now
> the timer tick happens exactly at this moment, stime is increased by 1 tick
> = 1000000ns.
>

Yes, I thought of that just after I sent out my email. In the case that
you mention, the utime and stime accounting is incorrect anyway :-)
I think we need to find a better solution. I was going to propose that
we round correctly in (the divisions in)

1. task_utime()
2. clock_t_to_cputime()

I suspect we'll need to round task_utime() to p->utime if the value of
task_utime() < p->utime and the same thing for task_stime(). I've tried
reproducing the problem on my UML setup without any success. Let me
try and grab an x86 box.

> Maybe there is some magic in the code which I did not see, but obviously
> the problem exists and looking at Frans data (stime+utime) are not decreasing,
> but stime isnt and utime is. If you look at Frans data you see:
> Oct 16 11:54:48 8 10
> Oct 16 11:54:49 6 12 <-- utime
> Oct 16 11:54:50 6 12
> Oct 16 11:54:51 6 12
> Oct 16 11:54:52 8 10 <-- stime
> Oct 16 11:54:53 8 10
> Oct 16 11:54:54 8 10
> Oct 16 11:54:55 8 12
> Oct 16 11:54:56 8 12
>
> (stime+utime) is constant. That means that S2-S1 is obviously smaller than
> one tick (See the calculation in task_stime). I am quite sure it is caused
> by changes in the sampled values p->utime and p->stime.
>

Yes, very interesting observation.

[snip]


--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/