Re: Re: [PATCH] proc: Do not overflow get_{idle,iowait}_time fornohz (was: Re: Re: [REGRESSION] [Linux 3.2] top/htop and all other CPUusage)

From: Artem S. Tashkinov
Date: Fri Dec 02 2011 - 15:12:17 EST


On Dec 2, 2011, Michal Hocko wrote:

> And the one with a more cleaned up changelog. No functional changes
> ---
> From 107887016b91de59194a93c751d040b05d5e37fe Mon Sep 17 00:00:00 2001
> From: Michal Hocko <>
> Date: Fri, 2 Dec 2011 16:17:03 +0100
> Subject: [PATCH] proc: Do not overflow get_{idle,iowait}_time for nohz
>
> Since a25cac51 [proc: Consider NO_HZ when printing idle and iowait times]
> we are reporting idle/io_wait time also while a CPU is tickless. We rely
> on get_{idle,iowait}_time functions to retrieve proper data.
>
> These functions, however, use usecs_to_cputime to translate micro
> seconds time to cputime64_t. This is just an alias to usecs_to_jiffies
> which reduces the data type from u64 to unsigned int and also checks
> whether the given parameter overflows jiffies_to_usecs(MAX_JIFFY_OFFSET)
> and returns MAX_JIFFY_OFFSET in that case.
>
> When do we overflow depends on CONFIG_HZ but especially for
> CONFIG_HZ_300 it is quite low (1431649781) so we are getting
> MAX_JIFFY_OFFSET for >3000s! until we overflow unsigned int.
> Just for reference CONFIG_100 has an overflow window around 20s,
> CONFIG_250 ~8s and CONFIG_1000 ~2s.
>
> This results in a bug when people saw [h]top going mad reporting 100%
> CPU usage even though there was basically no CPU load. The reason was
> simply that /proc/stat stopped reporting idle/io_wait changes (and
> reported MAX_JIFFY_OFFSET) and so the only change happening was for
> user system time.
>
> Let's use nsecs_to_jiffies64 instead which doesn't reduce the precision
> to 32b type and it is much more appropriate for cumulative time values
> (unlike usecs_to_jiffies which intended for timeout calculations).
>
> Signed-off-by: Michal Hocko <mhocko@xxxxxxx>
> ---
> fs/proc/stat.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/proc/stat.c b/fs/proc/stat.c
> index 42b274d..2a30d67 100644
> --- a/fs/proc/stat.c
> +++ b/fs/proc/stat.c
> @@ -32,7 +32,7 @@ static cputime64_t get_idle_time(int cpu)
> idle = kstat_cpu(cpu).cpustat.idle;
> idle = cputime64_add(idle, arch_idle_time(cpu));
> } else
> - idle = usecs_to_cputime(idle_time);
> + idle = nsecs_to_jiffies64(1000 * idle_time);
>
> return idle;
> }
> @@ -46,7 +46,7 @@ static cputime64_t get_iowait_time(int cpu)
> /* !NO_HZ so we can rely on cpustat.iowait */
> iowait = kstat_cpu(cpu).cpustat.iowait;
> else
> - iowait = usecs_to_cputime(iowait_time);
> + iowait = nsecs_to_jiffies64(1000 * iowait_time);
>
> return iowait;
> }
> --
> 1.7.7.3

Thank you, this patch has fixed the issue for me.

Tested-by: Artem S. Tashkinov <t.artem@xxxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/