Re: [RFC PATCH v5 4/7] sched: bias task wakeups to preferredsemi-idle packages

From: Vaidyanathan Srinivasan
Date: Mon Dec 15 2008 - 07:26:19 EST


* Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx> [2008-12-15 14:16:42]:

> * Peter Zijlstra <peterz@xxxxxxxxxxxxx> [2008-12-15 09:33:04]:
>
> > On Mon, 2008-12-15 at 09:25 +0100, Peter Zijlstra wrote:
> > > On Mon, 2008-12-15 at 12:31 +0530, Balbir Singh wrote:
> > >
> > > > > kernel/sched_fair.c | 17 +++++++++++++++++
> > > > > 1 files changed, 17 insertions(+), 0 deletions(-)
> > > > >
> > > > > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> > > > > index 98345e4..939f2a1 100644
> > > > > --- a/kernel/sched_fair.c
> > > > > +++ b/kernel/sched_fair.c
> > > > > @@ -1027,6 +1027,23 @@ static int wake_idle(int cpu, struct task_struct *p)
> > > > > cpumask_t tmp;
> > > > > struct sched_domain *sd;
> > > > > int i;
> > > > > + unsigned int chosen_wakeup_cpu;
> > > > > + int this_cpu;
> > > > > +
> > > > > + /*
> > > > > + * At POWERSAVINGS_BALANCE_WAKEUP level, if both this_cpu and prev_cpu
> > > > > + * are idle and this is not a kernel thread and this task's affinity
> > > > > + * allows it to be moved to preferred cpu, then just move!
> > > > > + */
> > > > > +
> > > > > + this_cpu = smp_processor_id();
> > > > > + chosen_wakeup_cpu =
> > > > > + cpu_rq(this_cpu)->rd->sched_mc_preferred_wakeup_cpu;
> > > > > +
> > > > > + if (sched_mc_power_savings >= POWERSAVINGS_BALANCE_WAKEUP &&
> > > > > + idle_cpu(cpu) && idle_cpu(this_cpu) && p->mm &&
> > > >
> > > > The p->mm check is racy, it needs to be done under task_lock(). The
> > > > best way to check for a kernel thread is get_task_mm(), followed by
> > > > put_task_mm() is the mm is not NULL. We also need to check to see if
> > > > the task is _hot_ on cpu. We should negate this optimization in case
> > > > chosen_wakeup_cpu is idle, so check for that as well.
> > >
> > > Sure its racy, but so what?
> > >
> > > The worst I can see it that we exclude a dying task from this logic,
> > > which isn't a problem at all, since its dying anyway.
> >
> > At which point I seriously doubt it'd still be on the rq anyway.
> >
>
> I forgot to mention that, the check should be (p->mm && !(p->flags & PF_KTHREAD))

I can check for PF_KTHREAD for now. However, I should reduce the
number of checks since this may slow down wake_idle for sched_mc=2.

We can tolerate p->mm check on a dying process as Peter has suggested,
hence we don't need to protect it. We are not going to access any
contents of the mm struct.

If PF_KTHREAD is only being used by AIO, then I feel we can drop the
check since the threads will not have affinity and they can be moved
to other cpus anyway.

The main reason for skipping kthread is that they may be using per-cpu
variables and sleep/preempted. I did not want the wake_idle() logic
to move them around forcefully. This is not the general case and this
situation should not happen.

Second reason is to optimise on the affinity check since most of the
kthreads have affinity and cannot be moved.

This condition check needs optimisation after getting the framework
functionally correct and useful.

--Vaidy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/