Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer

From: Andy Lutomirski
Date: Wed May 20 2015 - 21:34:42 EST


On 05/19/2015 01:01 AM, Huang Rui wrote:
MWAITX/MWAIT does not let the cpu core go into C1 state on AMD processors.
The cpu core still consumes less power while waiting, and has faster exit
from waiting than "Halt". This patch implements an interface using the
kernel parameter "idle=" to configure mwaitx type and timer value.

If "idle=mwaitx", the timeout will be set as the maximum value
((2^64 - 1) * TSC cycle).
If "idle=mwaitx,100", the timeout will be set as 100ns.
If the processor doesn't support MWAITX, then halt is used.

I think this is wrong way to do this...

+ x86_idle = mwaitx_idle;

...this is a legacy thing. The modern idle path is cpuidle_idle_call, I believe, that that goes through the cpuidle subsystem, which has little to do with any of this.

Where is the MWAITX documentation? It seems that AMD has failed to update the obvious reference:

http://developer.amd.com/resources/documentation-articles/developer-guides-manuals/

From my vague understanding, MWAITX accepts a 32-bit maximum number of TSC ticks to wait. If that's correct, and it's not too late to change, then: AMD, you blew it. The correct way to do this would be to accept a 64-bit absolute TSC deadline.

The 32-bit relative timeout model utterly sucks for two reasons. Suppose we tried to use it. We'd have two major issues:

1. We can't sleep more than about 1.5 seconds because we'll overflow the deadline.

2. The relative timeout is annoying. Imagine:

rdtsc
shove the computed timeout into ebx
<-- IRQ here
mwaitx

now we sleep too long.

We can do:

cli
rdtsc
shove the computed timeout into ebx
mov $1,%ecx
mwaitx
sti

but that's annoying and isn't really correct wrt NMIs.

So this sucks.

In any event, I think this is barely useful.

That being said, it might be worth teaching the timer code about a magical ideal type of clock that is simultaneously a perfect invariant high-res clocksource *and* a very fast (in fact free) wakeup source that uses the same time base. In fact, Sandy Bridge and newer Intel CPUs have such a thing: it's called the TSC deadline timer. I think it's much faster to reprogram than other timers, and it ought to avoid a whole bunch of complicated messy code that handles the fact that crappier timers have their own crappy time bases.

If we did that *and* we had a non-crappy mwaitx, then we could apply an optimization: when going idle, we could turn off the TSC deadline timer and use mwaitx instead. This would about an interrupt if the event that wakes us is our timer.

In the mean time, I don't really see the point.

John, Peter, Thomas: would it actually make sense to teach the core timer/clockevent code about perfect time sources like invariant TSC + TSC deadline? AFAICT right now we're not doing anything particularly interesting with the TSC deadline timer.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/