Re: [PATCH] proc: Do not overflow get_{idle,iowait}_time for nohz(was: Re: Re: [REGRESSION] [Linux 3.2] top/htop and all other CPU usage)

From: Michal Hocko
Date: Fri Dec 02 2011 - 12:59:36 EST


On Fri 02-12-11 17:49:17, Michal Hocko wrote:
> On Fri 02-12-11 14:35:15, Michal Hocko wrote:
> > On Tue 29-11-11 11:38:47, Artem S. Tashkinov wrote:
> > > On Nov 29, 2011, Michal Hocko <mhocko@xxxxxxx> wrote:
> > >
> > > > As I have written in other email could you post your config and collect
> > > > the following data?
> > > > for i in `seq 30`;
> > > > do
> > > > cat /proc/stat > `date +'%s'`
> > > > sleep 1
> > > > done
> > > > export old_user=0 old_nice=0 old_sys=0 old_idle=0 old_iowait=0;
> > > >
> > > > # for all your available CPUs
> > > > grep cpu0 * | while read cpu user nice sys idle iowait rest;
> > > > do
> > > > echo $cpu $(($user-$old_user)) $(($nice-$old_nice)) $(($sys-$old_sys)) $(($idle-$old_idle)) $(($iowait-$old_iowait))
> > > > old_user=$user old_nice=$nice old_sys=$sys old_idle=$idle old_iowait=$iowait
> > > > done
> > >
> > > 1322566208:cpu0 5199 0 2931 357890604 2541
> > > 1322566209:cpu0 0 0 1 0 0
> > > 1322566210:cpu0 0 0 0 0 0
> > > 1322566211:cpu0 0 0 0 0 0
> [...]
> >
> > Could you post raw data as well? Ideally starting right after boot and
> > collected for more than 30s (longer better...)
>
> Ahh, missed that you attached data. And also noticed that you are using
> CONFIG_HZ_300 which explains the problem and why I do cannot reproduce
> it.
>
> get_{idle,iowait}_time translates us to cputime64_t and it uses
> usecs_to_cputime which is just an alias for usecs_to_jiffies and it does
> if (u > jiffies_to_usecs(MAX_JIFFY_OFFSET))
> return MAX_JIFFY_OFFSET;
> which in your case (HZ=300) means that we overflow much more often than
> for HZ==100. The patch below should fix this:

And the one with a more cleaned up changelog. No functional changes
---