Re: [RFC PATCH 2/4] sched: Adding gang scheduling infrastrucure

From: Nikunj A Dadhania
Date: Mon Dec 19 2011 - 20:38:36 EST


On Mon, 19 Dec 2011 16:51:44 +0100, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Mon, 2011-12-19 at 14:04 +0530, Nikunj A. Dadhania wrote:
>
> > + raw_spin_lock_irqsave(&rq->lock, flags);
> > +
> > + /* Check if the runqueue has runnable tasks */
> > + if (cfs_rq->nr_running) {
> > + /* Favour this task group and set need_resched flag,
> > + * added by following patches */
>
> That's just plain insanity, patch 3 is all of 4 lines, why split that
> and have an incomplete patch here?
>
I will fold that in this patch.

> > + }
> > + raw_spin_unlock_irqrestore(&rq->lock, flags);
> > +}
> > +
> > +#define GANG_SCHED_GRANULARITY 8
>
> Why have this magical number to begin with?
>
We do not want to gang across the complete machine say 128cpus. Break it
to 16 independent gang. So that way we can scale up.

This can be a sysctl or architecture specific define.

> > +void gang_sched(struct task_group *tg, struct rq *rq)
> > +{
> > + /* We do not gang sched here */
> > + if (rq->gang_leader == 0 || !tg || tg->gang == 0)
> > + return;
> > +
> > + /* Yes thats the leader */
> > + if (rq->gang_leader == 1) {
> > +
> > + if (!in_interrupt() && !irqs_disabled()) {
>
> How can this ever happen, schedule() can't be called from interrupt
> context and post_schedule() ensures interrupts are enabled.
>
Ah... thought that schedule can get called from interrupt
context. Sometime back I had some crash without this, let me remove this
and check it.

And smp_call_function_many required that, so those conditions. From the
function header;

* You must not call this function with disabled interrupts or from a
* hardware interrupt handler or from a bottom half handler. Preemption
* must be disabled when calling this function.
*/

> > + smp_call_function_many(rq->gang_cpumask,
> > + gang_sched_member, tg, 0);
>
> See this is just not going to happen..
>
Why do you say that? I had trace functions in my debug code and I was
hitting gang_sched_member on the other cpus.

> > +
> > + for_each_domain(cpu_of(rq), sd) {
> > + count = 0;
> > + for_each_cpu(i, sched_domain_span(sd))
> > + count++;
>
> That's just incompetent; there's cpumask_weight(), also that's called
> sd->span_weight.
>
Let me go and check that out, will use them. It will definitely reduce
the code here.

Regards
Nikunj

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/