Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer

From: Borislav Petkov
Date: Tue May 19 2015 - 07:31:42 EST


On Tue, May 19, 2015 at 04:01:10PM +0800, Huang Rui wrote:
> MWAITX/MWAIT does not let the cpu core go into C1 state on AMD processors.
> The cpu core still consumes less power while waiting, and has faster exit
> from waiting than "Halt". This patch implements an interface using the
> kernel parameter "idle=" to configure mwaitx type and timer value.
>
> If "idle=mwaitx", the timeout will be set as the maximum value
> ((2^64 - 1) * TSC cycle).
> If "idle=mwaitx,100", the timeout will be set as 100ns.
> If the processor doesn't support MWAITX, then halt is used.

Ok, I see what you're trying here and I think this is not the optimal
approach.

So let me explain how I see it, you correct me if I'm wrong:

So we want to do MWAITX so that we can save us idle entry/exit overhead
with HLT. Because MWAITX is faster, reportedly.

Now, if we want to do that, we want to do it dynamically and adjust the
MWAITX sleep interval depending on the system, usage pattern, system
load and so on.

And for that we would need an adaptive scheme which approximates each
idle interval. Simply taking TSC before we enter idle and after we come
out would give us each idle residency duration and we can do some simple
math to approximate it.

Now, what would that bring us: faster wakeup times.

And here comes the 10^6 $ question: why are we doing all the fun?

I'm thinking we want to find a cutoff duration where for smaller
durations it is worth to do MWAITX and have faster entry/exit times and
for bigger durations we want to do HLT because it'll get into C1E and
give us higher power savings.

We don't want to do MWAITX too long because that'll burn more power
relatively to HLT but we don't want to do HLT for shorter periods
because then entry/exit costs.

Am I on the right track at least?

> Signed-off-by: Huang Rui <ray.huang@xxxxxxx>
> ---
> arch/x86/include/asm/mwait.h | 2 +
> arch/x86/include/asm/processor.h | 2 +-
> arch/x86/kernel/process.c | 79 ++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 82 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
> index b91136f..c4e51e7 100644
> --- a/arch/x86/include/asm/mwait.h
> +++ b/arch/x86/include/asm/mwait.h
> @@ -14,6 +14,8 @@
> #define CPUID5_ECX_INTERRUPT_BREAK 0x2
>
> #define MWAIT_ECX_INTERRUPT_BREAK 0x1
> +#define MWAITX_ECX_TIMER_ENABLE 0x2

Use BIT(1) here.

> +#define MWAITX_EBX_WAIT_TIMEOUT 0xffffffff
>
> static inline void __monitor(const void *eax, unsigned long ecx,
> unsigned long edx)
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 23ba676..0f60e94 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -733,7 +733,7 @@ extern unsigned long boot_option_idle_override;
> extern bool amd_e400_c1e_detected;
>
> enum idle_boot_override {IDLE_NO_OVERRIDE=0, IDLE_HALT, IDLE_NOMWAIT,
> - IDLE_POLL};
> + IDLE_POLL, IDLE_MWAITX};
>
> extern void enable_sep_cpu(void);
> extern int sysenter_setup(void);
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 6e338e3..9d68193 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -30,6 +30,7 @@
> #include <asm/debugreg.h>
> #include <asm/nmi.h>
> #include <asm/tlbflush.h>
> +#include <asm/x86_init.h>
>
> /*
> * per-CPU TSS segments. Thre ads are completely 'soft' on Linux,
> @@ -276,6 +277,7 @@ unsigned long boot_option_idle_override = IDLE_NO_OVERRIDE;
> EXPORT_SYMBOL(boot_option_idle_override);
>
> static void (*x86_idle)(void);
> +static unsigned long idle_param;
>
> #ifndef CONFIG_SMP
> static inline void play_dead(void)
> @@ -444,6 +446,17 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
> return 1;
> }
>
> +static int not_support_mwaitx(const struct cpuinfo_x86 *c)
> +{
> + if (c->x86_vendor != X86_VENDOR_AMD)
> + return 1;
> +
> + if (!cpu_has(c, X86_FEATURE_MWAITT))
> + return 1;
> +
> + return 0;
> +}
> +
> /*
> * MONITOR/MWAIT with no hints, used for default default C1 state.
> * This invokes MWAIT with interrutps enabled and no flags,
> @@ -470,12 +483,45 @@ static void mwait_idle(void)
> __current_clr_polling();
> }
>
> +/*
> + * AMD Excavator processors support the new MONITORX/MWAITX instructions.

No need for that especially when newer than XV processors start
supporting those too.

> + * The function is similar to mwait but with a timer. On AMD platforms
> + * mwaitx does not let the core go into C1 state. This provides for a
> + * faster waiting exit speed. The user can configure the idle method and
> + * timer value via the idle kernel parameter.
> + */

...

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/