Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS back to DEFAULT

From: Daniel Lezcano
Date: Thu Aug 14 2014 - 09:30:01 EST


On 08/14/2014 02:41 PM, Peter Zijlstra wrote:
On Thu, Aug 14, 2014 at 01:14:49PM +0200, Daniel Lezcano wrote:
On 08/14/2014 01:00 PM, Peter Zijlstra wrote:
On Thu, Aug 14, 2014 at 12:29:32PM +0200, Daniel Lezcano wrote:
Hi Chuansheng,

On 14 August 2014 04:11, Chuansheng Liu <chuansheng.liu@xxxxxxxxx> wrote:

We found sometimes even after we let PM_QOS back to DEFAULT,
the CPU still stuck at C0 for 2-3s, don't do the new suitable C-state
selection immediately after received the IPI interrupt.

The code model is simply like below:
{
pm_qos_update_request(&pm_qos, C1 - 1);
< == Here keep all cores at C0
...;
pm_qos_update_request(&pm_qos, PM_QOS_DEFAULT_VALUE);
< == Here some cores still stuck at C0 for 2-3s
}

The reason is when pm_qos come back to DEFAULT, there is IPI interrupt to
wake up the core, but when core is in poll idle state, the IPI interrupt
can not break the polling loop.

So seeing how you're from @intel.com I'm assuming you're using x86 here.

I'm not seeing how this can be possible, MWAIT is interrupted by IPIs
just fine, which means we'll fall out of the cpuidle_enter(), which
means we'll cpuidle_reflect(), and then leave cpuidle_idle_call().

It will indeed not leave the cpu_idle_loop() function and go right back
into cpuidle_idle_call(), but that will then call cpuidle_select() which
should pick a new C state.

So the interrupt _should_ work. If it doesn't you need to explain why.

I think the issue is related to the poll_idle state, in
drivers/cpuidle/driver.c. This state is x86 specific and inserted in the
cpuidle table as the state 0 (POLL). There is no mwait for this state. It is
a bit confusing because this state is not listed in the acpi / intel idle
driver but inserted implicitly at the beginning of the idle table by the
cpuidle framework when the driver is registered.

static int poll_idle(struct cpuidle_device *dev,
struct cpuidle_driver *drv, int index)
{
local_irq_enable();
if (!current_set_polling_and_test()) {
while (!need_resched())
cpu_relax();
}
current_clr_polling();

return index;
}

Ah, well, in that case there's a ton more broken than just this.
kick_all_cpus_sync() won't work either, and cpuidle_reflect() pretty
much expects to be called after each interrupt.

Agree.

Then again, not reflecting properly isn't really a problem, its not like
not accounting interrupts is going to safe power much.

I think the main issue here is to exit the poll_idle loop when an IPI is received. IIUC, there is a pm_qos user, perhaps a driver (Chuansheng can give more details), setting a very short latency, so the cpuidle framework choose a shallow state like the poll_idle and then the driver sets a bigger latency, leading to the IPI to wake all the cpus. As the CPUs are in the poll_idle, they don't exit until an event make them to exit the need_resched() loop (reschedule or whatever). This situation can let the CPUs to stand in the infinite loop several seconds while we are expecting them to exit the poll_idle and enter a deeper idle state, thus with an extra energy consumption.


--
<http://www.linaro.org/> Linaro.org â Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/