Re: [PATCH 16/16] KVM-GST: adjust scheduler cpu power

From: Glauber Costa
Date: Wed Jan 26 2011 - 10:43:57 EST


On Wed, 2011-01-26 at 10:57 +0100, Peter Zijlstra wrote:
> On Tue, 2011-01-25 at 19:27 -0200, Glauber Costa wrote:
> > On Tue, 2011-01-25 at 22:07 +0100, Peter Zijlstra wrote:
> > > On Tue, 2011-01-25 at 18:47 -0200, Glauber Costa wrote:
> > > > On Tue, 2011-01-25 at 21:13 +0100, Peter Zijlstra wrote:
> > > > > On Tue, 2011-01-25 at 18:02 -0200, Glauber Costa wrote:
> > > > >
> > > > > > I fail to see how does clock_task influence cpu power.
> > > > > > If we also have to touch clock_task for better accounting of other
> > > > > > stuff, it is a separate story.
> > > > > > But for cpu_power, I really fail. Please enlighten me.
> > > > >
> > > > > static void update_rq_clock_task(struct rq *rq, s64 delta)
> > > > > {
> > > > > s64 irq_delta;
> > > > >
> > > > > irq_delta = irq_time_read(cpu_of(rq)) - rq->prev_irq_time;
> > > > >
> > > > > if (irq_delta > delta)
> > > > > irq_delta = delta;
> > > > >
> > > > > rq->prev_irq_time += irq_delta;
> > > > > delta -= irq_delta;
> > > > > rq->clock_task += delta;
> > > > >
> > > > > if (irq_delta && sched_feat(NONIRQ_POWER))
> > > > > sched_rt_avg_update(rq, irq_delta);
> > > > > }
> > > > >
> > > > > its done through that sched_rt_avg_update() (should probably rename
> > > > > that), it computes a floating average of time not spend on fair tasks.
> > > > >
> > > > It creates a dependency on CONFIG_IRQ_TIME_ACCOUNTING, though.
> > > > This piece of code is simply compiled out if this option is disabled.
> > >
> > > We can pull this bit out and make the common bit also available for
> > > paravirt.
> >
> > scale_rt_power() seems to do the right thing, but all the path leading
> > to it seem to work on rq->clock, rather than rq->clock_task.
>
> Not quite, see how rq->clock_task is irq_delta less than the increment
> to rq->clock? You want it to be your steal-time delta less too.
yes, but once this delta is subtracted from rq->clock_task, this value is not
used to dictate power, unless I am mistaken.

power is adjusted according to scale_rt_power(), which does it using the
values of rq->rt_avg, rq->age_stamp, and rq->clock.

So whatever I store into rq->clock_task, but not rq->clock (which
correct me if I'm wrong, is expected to be walltime), will not be used
to adjust cpu power, which is what I'm trying to achieve.

> > Although I do can experiment with that as well, could you please
> > elaborate on what are your reasons to prefer this over than variations
> > of the method I proposed?
>
> Because I want rq->clock_task to not include steal-time.
Sure, fair deal. But at this point, those demands seem orthogonal to me.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/