Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

From: Morten Rasmussen
Date: Mon Sep 14 2015 - 08:52:56 EST


On Fri, Sep 11, 2015 at 10:05:53AM -0700, bsegall@xxxxxxxxxx wrote:
> Morten Rasmussen <morten.rasmussen@xxxxxxx> writes:
>
> > On Fri, Sep 11, 2015 at 08:28:25AM +0800, Yuyang Du wrote:
> >> diff --git a/include/linux/sched.h b/include/linux/sched.h
> >> index 119823d..55a7b93 100644
> >> --- a/include/linux/sched.h
> >> +++ b/include/linux/sched.h
> >> @@ -912,7 +912,7 @@ enum cpu_idle_type {
> >> /*
> >> * Increase resolution of cpu_capacity calculations
> >> */
> >> -#define SCHED_CAPACITY_SHIFT 10
> >> +#define SCHED_CAPACITY_SHIFT SCHED_RESOLUTION_SHIFT
> >> #define SCHED_CAPACITY_SCALE (1L << SCHED_CAPACITY_SHIFT)
> >>
> >> /*
> >> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> >> index 68cda11..d27cdd8 100644
> >> --- a/kernel/sched/sched.h
> >> +++ b/kernel/sched/sched.h
> >> @@ -40,6 +40,9 @@ static inline void update_cpu_load_active(struct rq *this_rq) { }
> >> */
> >> #define NS_TO_JIFFIES(TIME) ((unsigned long)(TIME) / (NSEC_PER_SEC / HZ))
> >>
> >> +# define SCHED_RESOLUTION_SHIFT 10
> >> +# define SCHED_RESOLUTION_SCALE (1L << SCHED_RESOLUTION_SHIFT)
> >> +
> >> /*
> >> * Increase resolution of nice-level calculations for 64-bit architectures.
> >> * The extra resolution improves shares distribution and load balancing of
> >> @@ -53,16 +56,15 @@ static inline void update_cpu_load_active(struct rq *this_rq) { }
> >> * increased costs.
> >> */
> >> #if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage under light load */
> >> -# define SCHED_LOAD_RESOLUTION 10
> >> -# define scale_load(w) ((w) << SCHED_LOAD_RESOLUTION)
> >> -# define scale_load_down(w) ((w) >> SCHED_LOAD_RESOLUTION)
> >> +# define SCHED_LOAD_SHIFT (SCHED_RESOLUTION_SHIFT + SCHED_RESOLUTION_SHIFT)
> >> +# define scale_load(w) ((w) << SCHED_RESOLUTION_SHIFT)
> >> +# define scale_load_down(w) ((w) >> SCHED_RESOLUTION_SHIFT)
> >> #else
> >> -# define SCHED_LOAD_RESOLUTION 0
> >> +# define SCHED_LOAD_SHIFT (SCHED_RESOLUTION_SHIFT)
> >> # define scale_load(w) (w)
> >> # define scale_load_down(w) (w)
> >> #endif
> >>
> >> -#define SCHED_LOAD_SHIFT (10 + SCHED_LOAD_RESOLUTION)
> >> #define SCHED_LOAD_SCALE (1L << SCHED_LOAD_SHIFT)
> >>
> >> #define NICE_0_LOAD SCHED_LOAD_SCALE
> >
> > I think this is pretty much the required relationship between all the
> > SHIFTs and SCALEs that Peter checked for in his #if-#error thing
> > earlier, so no disagreements from my side :-)
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
>
> SCHED_LOAD_RESOLUTION and the non-SLR part of SCHED_LOAD_SHIFT are not
> required to be the same value and should not be conflated.
>
> In particular, since cgroups are on the same timeline as tasks and their
> shares are not scaled by SCHED_LOAD_SHIFT in any way (but are scaled so
> that SCHED_LOAD_RESOLUTION is invisible), changing that part of
> SCHED_LOAD_SHIFT would cause issues, since things can assume that nice-0
> = 1024. However changing SCHED_LOAD_RESOLUTION would be fine, as that is
> an internal value to the kernel.
>
> In addition, changing the non-SLR part of SCHED_LOAD_SHIFT would require
> recomputing all of prio_to_weight/wmult for the new NICE_0_LOAD.

I think I follow, but doesn't that mean that the current code is broken
too? NICE_0_LOAD changes if you change SCHED_LOAD_RESOLUTION:

#define SCHED_LOAD_SHIFT (10 + SCHED_LOAD_RESOLUTION)
#define SCHED_LOAD_SCALE (1L << SCHED_LOAD_SHIFT)

#define NICE_0_LOAD SCHED_LOAD_SCALE
#define NICE_0_SHIFT SCHED_LOAD_SHIFT

To me it sounds like we need to define it the other way around:

#define NICE_0_SHIFT 10
#define NICE_0_LOAD (1L << NICE_0_SHIFT)

and then add any additional resolution bits from there to ensure that
NICE_0_LOAD and the prio_to_weight/wmult tables are unchanged.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/