Re: [RFC] [PATCH 0/3] sched: Support for real CPU runtime and SMT scaling

From: Peter Zijlstra
Date: Sat Jan 31 2015 - 06:43:14 EST


On Fri, Jan 30, 2015 at 03:02:39PM +0100, Philipp Hachtmann wrote:
> Hello,
>
> when using "real" processors the scheduler can make its decisions based
> on wall time. But CPUs under hypervisor control are sometimes
> unavailable without further notice to the guest operating system.
> Using wall time for scheduling decisions in this case will lead to
> unfair decisions and erroneous distribution of CPU bandwidth when
> using cgroups.
> On (at least) S390 every CPU has a timer that counts the real execution
> time from IPL. When the hypervisor has sheduled out the CPU, the timer
> is stopped. So it is desirable to use this timer as a source for the
> scheduler's rq runtime calculations.
>
> On SMT systems the consumed runtime of a task might be worth more
> or less depending on the fact that the task can have run alone or not
> during the last delta. This should be scalable based on the current
> CPU utilization.

So we've explicitly never done this before because at the end of the day
its wall time that people using the computer react to.

Also, once you open this door you can have endless discussions of what
constitutes work. People might want to use instructions retired for
instance, to normalize against pipeline stalls.

Also, if your hypervisor starves its vcpus of compute time; how is that
our problem?

Furthermore, we already have some stealtime accounting in
update_rq_clock_task() for the virt crazies^Wpeople.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/