Re: [PATCH -next] sched/cputime: Fix the bug of reading time backward from /proc/stat

From: zhengzucheng
Date: Sun Sep 04 2022 - 23:47:53 EST



Assume that a CPU time“ A” is read from /proc/stat, and after a while,  a CPU time “B” is read. If T = B – A < 0, T is identified as a large number as an unsigned integer. As a result, the CPU usage calculated by this way will be abnormally high. It seems to be a problem to be fixed.

original link:
https://lore.kernel.org/lkml/20220813000102.42051-1-hucool.lihua@xxxxxxxxxx/

在 2022/8/15 16:15, Peter Zijlstra 写道:
On Sat, Aug 13, 2022 at 08:01:02AM +0800, Li Hua wrote:
The problem that the statistical time goes backward, the value read first is 319, and the value read again is 318. As follows:
first:
cat /proc/stat | grep cpu1
cpu1 319 0 496 41665 0 0 0 0 0 0
then:
cat /proc/stat | grep cpu1
cpu1 318 0 497 41674 0 0 0 0 0 0

Time goes back, which is counterintuitive.

After debug this, The problem is caused by the implementation of kcpustat_cpu_fetch_vtime. As follows:

CPU0 CPU1
First:
show_stat():
->kcpustat_cpu_fetch()
->kcpustat_cpu_fetch_vtime()
->cpustat[CPUTIME_USER] = kcpustat_cpu(cpu) + vtime->utime + delta; rq->curr is in user mod
---> When CPU1 rq->curr running on userspace, need add utime and delta
---> rq->curr->vtime->utime is less than 1 tick
Then:
show_stat():
->kcpustat_cpu_fetch()
->kcpustat_cpu_fetch_vtime()
->cpustat[CPUTIME_USER] = kcpustat_cpu(cpu); rq->curr is in kernel mod
---> When CPU1 rq->curr running on kernel space, just got kcpustat
This is unreadable, what?!?
.