Re: [BISECTED] "conservative" cpufreq governor broken

From: Steven Noonan
Date: Tue Oct 06 2009 - 06:23:57 EST


On Tue, Oct 6, 2009 at 12:31 AM, Eero Nurkkala
<ext-eero.nurkkala@xxxxxxxxx> wrote:
> On Mon, 2009-10-05 at 18:32 +0200, ext Steven Noonan wrote:
>> I noticed on my machine that the "conservative" cpufreq governor wasn't
>> working properly in v2.6.31.1 or Linus' latest tree, but it worked fine on
>> v2.6.30.8, so I decided I should figure out where this issue was coming
>> from. The issue is pretty clear...
>>
>
> I had some troubles with cpufreq-info as all values in "cpufreq stats"
> were being as 0,00% (I fixed it by replacing unsigned long longs with
> unsigned longs, and recompiled)

That doesn't make much sense for my case. I used the same build of
cpufreq-info through the whole bisection and the problem was very
consistently reproducible throughout. And the bisected commit is
indeed related to cpufreq.

Regardless, cpufreq-info wasn't the only reason that I thought
something was wrong. My machine was unusually warm, fans were going
pretty much constantly, and the Gnome applet for CPU frequency
monitoring constantly showed it was using 2.33GHz and never scaling
down. I checked system load, which stayed at zero. Multiple different
sources (top, perf top, etc) seemed to indicate there was no
justification for the "conservative" governor's behaviour.

> If this shows still insane values:
> cat /sys/devices/system/cpu/cpu*/cpufreq/stats/time_in_state
> I guess your system is indeed broken.

Here's the output:

2333000 6880
2167000 0
2000000 0
1833000 0
1667000 0
1500000 0
1333000 0
1000000 0
2333000 6880
2167000 0
2000000 0
1833000 0
1667000 0
1500000 0
1333000 0
1000000 0

Pretty broken, alright.

> However. I get:
> (cat /sys/devices/system/cpu/cpu*/cpufreq/stats/time_in_state)
>
> (OP1 == highest Frequency)
> OP1 7148
> OP2 242
> OP3 2307
> OP4 43145

I would suspect you have to have CONFIG_NO_HZ enabled to be able to
reproduce the issue (considering the title of the bisected commit and
my own config). Do you have it enabled?

> And another round:
>
> cpufreq stats: OP1:16,78%, OP2:0,24%, OP3:5,14%, OP4:77,83%  (72)
>
> Just once more after doing nothing:
> OP1:7,41%, OP2:0,11%, OP3:2,38%, OP4:90,10%  (82)
>
> So I can't agree it's broken. The patch you bisected, actually filtered
> out such phenomenon, in which an IRQ made the cpufreq framework
> occasionally think we were idling, although we were not. So you got
> "bonus" idle time that shouldn't been there in the first place. Now that
> the "bonus" idle time is not there, your system load may indeed be so
> high that the system never spends 80% or more time in idle? Could that
> be the case? Of course, even though I can't agree it's broken, doesn't
> mean it isn't somehow broken ;) It'd be nice to get info on other
> systems as well...

Interestingly, "ondemand" (the governor fixed by the bisected commit)
works fine. "conservative" is the only broken one.

>>
>> Here's the expected:
>>
>> cpufrequtils 005: cpufreq-info (C) Dominik Brodowski 2004-2006
>> Report errors and bugs to cpufreq@xxxxxxxxxxxxxxx, please.
>> analyzing CPU 0:
>>   driver: acpi-cpufreq
>>   CPUs which need to switch frequency at the same time: 0
>>   hardware limits: 1000 MHz - 2.33 GHz
>>   available frequency steps: 2.33 GHz, 2.17 GHz, 2.00 GHz, 1.83 GHz, 1.67 GHz, 1.50 GHz, 1.33 GHz, 1000 MHz
>>   available cpufreq governors: ondemand, userspace, powersave, conservative, performance
>>   current policy: frequency should be within 1000 MHz and 2.33 GHz.
>>                   The governor "conservative" may decide which speed to use
>>                   within this range.
>>   current CPU frequency is 1000 MHz (asserted by call to hardware).
>>   cpufreq stats: 2.33 GHz:0.59%, 2.17 GHz:1.41%, 2.00 GHz:0.88%, 1.83 GHz:1.22%, 1.67 GHz:0.88%, 1.50 GHz:1.41%, 1.33 GHz:10.98%, 1000 MHz:82.63%  (33)
>> analyzing CPU 1:
>>   driver: acpi-cpufreq
>>   CPUs which need to switch frequency at the same time: 1
>>   hardware limits: 1000 MHz - 2.33 GHz
>>   available frequency steps: 2.33 GHz, 2.17 GHz, 2.00 GHz, 1.83 GHz, 1.67 GHz, 1.50 GHz, 1.33 GHz, 1000 MHz
>>   available cpufreq governors: ondemand, userspace, powersave, conservative, performance
>>   current policy: frequency should be within 1000 MHz and 2.33 GHz.
>>                   The governor "conservative" may decide which speed to use
>>                   within this range.
>>   current CPU frequency is 1000 MHz (asserted by call to hardware).
>>   cpufreq stats: 2.33 GHz:0.40%, 2.17 GHz:0.16%, 2.00 GHz:0.16%, 1.83 GHz:0.35%, 1.67 GHz:0.16%, 1.50 GHz:0.35%, 1.33 GHz:0.16%, 1000 MHz:98.27%  (7)
>>
>>
>>
>> And here is the broken version (note the 'cpufreq stats' line):
>>
>> cpufrequtils 005: cpufreq-info (C) Dominik Brodowski 2004-2006
>> Report errors and bugs to cpufreq@xxxxxxxxxxxxxxx, please.
>> analyzing CPU 0:
>>   driver: acpi-cpufreq
>>   CPUs which need to switch frequency at the same time: 0
>>   hardware limits: 1000 MHz - 2.33 GHz
>>   available frequency steps: 2.33 GHz, 2.17 GHz, 2.00 GHz, 1.83 GHz, 1.67 GHz, 1.50 GHz, 1.33 GHz, 1000 MHz
>>   available cpufreq governors: ondemand, userspace, powersave, conservative, performance
>>   current policy: frequency should be within 1000 MHz and 2.33 GHz.
>>                   The governor "conservative" may decide which speed to use
>>                   within this range.
>>   current CPU frequency is 2.33 GHz (asserted by call to hardware).
>>   cpufreq stats: 2.33 GHz:100.00%, 2.17 GHz:0.00%, 2.00 GHz:0.00%, 1.83 GHz:0.00%, 1.67 GHz:0.00%, 1.50 GHz:0.00%, 1.33 GHz:0.00%, 1000 MHz:0.00%
>> analyzing CPU 1:
>>   driver: acpi-cpufreq
>>   CPUs which need to switch frequency at the same time: 1
>>   hardware limits: 1000 MHz - 2.33 GHz
>>   available frequency steps: 2.33 GHz, 2.17 GHz, 2.00 GHz, 1.83 GHz, 1.67 GHz, 1.50 GHz, 1.33 GHz, 1000 MHz
>>   available cpufreq governors: ondemand, userspace, powersave, conservative, performance
>>   current policy: frequency should be within 1000 MHz and 2.33 GHz.
>>                   The governor "conservative" may decide which speed to use
>>                   within this range.
>>   current CPU frequency is 2.33 GHz (asserted by call to hardware).
>>   cpufreq stats: 2.33 GHz:100.00%, 2.17 GHz:0.00%, 2.00 GHz:0.00%, 1.83 GHz:0.00%, 1.67 GHz:0.00%, 1.50 GHz:0.00%, 1.33 GHz:0.00%, 1000 MHz:0.00%
>>
>>
>> So basically, it just never clocks down from the maximum frequency.
>>
>>
>> Here's the bisection log:
>>
>>  # bad:  [2147b209] Linux 2.6.31.1
>>  # good: [a1c4c06a] Linux 2.6.30.8
>>  # good: [07a2039b] Linux 2.6.30
>>  # good: [452dac45] V4L/DVB (11761): dvb-ttpci: Fixed VIDEO_SLOWMOTION
>>  # bad:  [906e8d97] e1000e: delay second read of PHY_STATUS register o
>>  # good: [36e84467] Staging: heci: fix userspace pointer mess
>>  # bad:  [df36b439] Merge branch 'for-2.6.31' of git://git.linux-nfs.o
>>  # skip: [12e24f34] Merge branch 'perfcounters-fixes-for-linus' of git
>>  # good: [48c93112] powerpc: Fix invalid construct in our CPU selectio
>>  # bad:  [eca41044] n_r3964: fix lock imbalance
>>  # good: [93db6294] Merge branch 'for-linus' of git://git.kernel.org/p
>>  # bad:  [1eb51c33] Merge branch 'sched-fixes-for-linus' of git://git.
>>  # good: [1d991001] Merge branch 'x86/mce3' into x86/urgent
>>  # good: [71e308a2] function-graph: add stack frame test
>>  # bad:  [38df92b8] Merge branch 'timers-fixes-for-linus' of git://git
>>  # good: [ad5cf46b] Merge git://git.kernel.org/pub/scm/linux/kernel/gi
>>  # good: [7fd5b632] Merge branch 'for-linus' of git://git.monstr.eu/li
>>  # good: [c4c5ab30] Merge branch 'x86-fixes-for-linus' of git://git.ke
>>  # bad:  [f2e21c96] NOHZ: Properly feed cpufreq ondemand governor
>>
>>
>> And finally, the commit that broke "conservative":
>>
>> commit f2e21c9610991e95621a81407cdbab881226419b
>> Author: Eero Nurkkala <ext-eero.nurkkala@xxxxxxxxx>
>> Date:   Mon May 25 09:57:37 2009 +0300
>>
>>     NOHZ: Properly feed cpufreq ondemand governor
>>
>>     A call from irq_exit() may occasionally pause the timing
>>     info for cpufreq ondemand governor. This results in the
>>     cpufreq ondemand governor to fail to calculate the
>>     system load properly. Thus, relocate the checks for this
>>     particular case to keep the governor always functional.
>>
>>     Signed-off-by: Eero Nurkkala <ext-eero.nurkkala@xxxxxxxxx>
>>     Reported-by: Tero Kristo <tero.kristo@xxxxxxxxx>
>>     Acked-by: Rik van Riel <riel@xxxxxxxxxx>
>>     Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@xxxxxxxxx>
>>     Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>>
>>
>> I'd work on fixing it myself and whip up a patch, but I'm going to be gone
>> all day and I'm not too familiar with cpufreq anyway.
>>
>> - Steven
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/