Re: [RFC][PATCH 07/22] sched: SCHED_DEADLINE push and pull logic

From: Peter Zijlstra
Date: Tue Nov 23 2010 - 09:28:51 EST


On Sun, 2010-11-14 at 10:14 +0100, Raistlin wrote:
> On Fri, 2010-11-12 at 17:17 +0100, Peter Zijlstra wrote:
> > On Fri, 2010-10-29 at 08:32 +0200, Raistlin wrote:
> > > Add dynamic migrations to SCHED_DEADLINE, so that tasks can
> > > be moved among CPUs when necessary. It is also possible to bind a
> > > task to a (set of) CPU(s), thus restricting its capability of
> > > migrating, or forbidding migrations at all.
> > >
> > > The very same approach used in sched_rt is utilised:
> > > - -deadline tasks are kept into CPU-specific runqueues,
> > > - -deadline tasks are migrated among runqueues to achieve the
> > > following:
> > > * on an M-CPU system the M earliest deadline ready tasks
> > > are always running;
> > > * affinity/cpusets settings of all the -deadline tasks is
> > > always respected.
> >
> > I haven't fully digested the patch, I keep getting side-tracked and its
> > a large patch..
> >
> BTW, I was thinking about your suggestion of adding a *debugging* knob
> for achieving a "lock everything while I'm migrating" behaviour... :-)
>
> Something like locking the root_domain during pushes and pulls won't
> probably work, since both of them do a double_lock_balance, taking two
> rq, which might race with this new "global" lock.
> Something like we (CPU#1) hold rq1->lock, we take rd->lock, and then we
> try to take rq2->lock. CPU#2 holds rq2->lock and try to take rd->lock.
> Stuck! :-(
> This should be possible if both CPU#1 and CPU#2 are into a push or a
> pull which, on each one, involves some task on the other. Do you agree,
> or I'm missing/mistaking something? :-)
>
> Something we can probably do is locking the root_domain for
> _each_and_every_ scheduling decision, having all the rq->locks nesting
> inside our new root_domain->lock. This would emulate some sort of unique
> global rq implementation, since also local decisions on a CPU will
> affect all the others, as if they were sharing a single rq... But it's
> going to be very slow on large machines (but I guess we can afford
> that... It's debugging!), and will probably affect other scheduling
> class.
> I'm not sure we want the latter... But maybe it could be useful for
> debugging others too (at least for FIFO/RR, it should be!).
>
> Let me know what you think...

Ugh!.. lock ordering sucks :-)

I think we can cheat since double_rq_lock() and double_lock_balance()
can already unlock both locks, so you can simply: unlock both, lock rd,
then lock both.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/