Re: nohz_full left a periodic tick cpu issue

From: Alex Shi
Date: Wed Dec 11 2013 - 06:00:39 EST


On 12/11/2013 02:28 PM, Alex Shi wrote:
> Hi Frederic,
>
> Sorry for idiot of nohz_full. When we using this feature on my mobile
> devices, we found this feature keep cpu0 in periodic tick mode. then the
> timer interrupt on cpu0 is very higher than normal nohz mode.
> that cause high power consuming cost.
>
> I found you have mention this on commit: a382bf934449
> nohz: Assign timekeeping duty to a CPU outside the full dynticks range
>
> In fact, if all full dynticks cpu are in idle, cpu0 should be safe to
> get into idle too. Do you have some plan or idea to implement this?
> otherwise, power cost is too high to enable nohz_full in mobile platform.

CC to more experts: Paul and zhong.

In fact, I try to figure out a simple solution for this:
We can use a global variable to store full dyntick cpus number, when the number goes to 0, the cpu0 can be get into nohz mode as normal nohz.
Then before any cpu get into full dyntick, it can send out a need resched to wake up the cpu0.

but I am stalled for a week in making the nr_nohz_busy_cpu balance on a 2 cpus system - pandaboard es. the variable may increases 2 or 3 times continuously without a decrease. so nr_nohz_busy_cpus increased to 28 in 7200 seconds.

Anyone like point out what I missed in this simple patch?

For more cpus system you can just assign one cpu as full dyntick cpu to debug this patch.