Re: sched: ARM: arch_scale_freq_power

From: Peter Zijlstra
Date: Tue Oct 11 2011 - 03:51:26 EST

Next message: Keshava Munegowda: "[PATCH 3/5 v14] arm: omap: usb: register hwmods of usbhs"
Previous message: Keshava Munegowda: "[PATCH 2/5 v14] arm: omap: usb: ehci and ohci hwmod structures for omap3"
In reply to: Amit Kucheria: "Re: sched: ARM: arch_scale_freq_power"
Next in thread: Vincent Guittot: "Re: sched: ARM: arch_scale_freq_power"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, 2011-10-11 at 12:46 +0530, Amit Kucheria wrote:
> Adding Peter to the discussion..

Right, CCing the folks who actually wrote the code you're asking
questions about always helps ;-)

> On Thu, Oct 6, 2011 at 5:06 PM, Vincent Guittot
> <vincent.guittot@xxxxxxxxxx> wrote:
> > I work to link the cpu_power of ARM cores to their frequency by using
> > arch_scale_freq_power.

Why and how? In particular note that if you're using something like the
on-demand cpufreq governor this isn't going to work.

> It's explained in the kernel that cpu_power is
> > used to distribute load on cpus and a cpu with more cpu_power will
> > pick up more load. The default value is SCHED_POWER_SCALE and I
> > increase the value if I want a cpu to have more load than another one.
> > Is there an advised range for cpu_power value as well as some time
> > scale constraints for updating the cpu_power value ?

Basically 1024 is the unit and denotes the capacity of a full core at
'normal' speed.

Typically cpufreq would down-clock a core and thus you'd end up with a
smaller number (linearly proportional to the freq ratio etc. although if
you want to go really fancy you could determine the actual
throughput/freq curves).

Things like x86 turbo mode would result in a >1024 value.

Things like SMT would typically result in <1024 and the SMT sum over the
core >1024 (if you're lucky).

> > I'm also wondering why this scheduler feature is currently disable by default ?

Because the only implementation in existence (x86) is broken and I
haven't gotten around to fixing it. Arguable we should disable that for
the time being, see below.

> In discussions with Vincent regarding this, I've wondered whether
> cpu_power wouldn't be better renamed to cpu_capacity since that is
> what it really seems to describe.

Possibly, but its been cpu_power for ages and we use capacity to
describe something else.

---
arch/x86/kernel/cpu/sched.c | 9 ++++++++-
1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/sched.c b/arch/x86/kernel/cpu/sched.c
index a640ae5..90ae68c 100644
--- a/arch/x86/kernel/cpu/sched.c
+++ b/arch/x86/kernel/cpu/sched.c
@@ -6,7 +6,14 @@
#include <asm/cpufeature.h>
#include <asm/processor.h>

-#ifdef CONFIG_SMP
+#if 0 /* def CONFIG_SMP */
+
+/*
+ * Currently broken, we need to filter out idle time because the aperf/mperf
+ * ratio measures actual throughput, not capacity. This means that if a logical
+ * cpu idles it will report less capacity and receive less work, which isn't
+ * what we want.
+ */

static DEFINE_PER_CPU(struct aperfmperf, old_perf_sched);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Keshava Munegowda: "[PATCH 3/5 v14] arm: omap: usb: register hwmods of usbhs"
Previous message: Keshava Munegowda: "[PATCH 2/5 v14] arm: omap: usb: ehci and ohci hwmod structures for omap3"
In reply to: Amit Kucheria: "Re: sched: ARM: arch_scale_freq_power"
Next in thread: Vincent Guittot: "Re: sched: ARM: arch_scale_freq_power"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]