Re: [PATCH 03/19] scheduler: implement workqueue scheduler class

From: Ingo Molnar
Date: Thu Oct 01 2009 - 15:24:34 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Thu, 1 Oct 2009, Avi Kivity wrote:
> >
> > Sure, but it would mean that we need a new notifier. sched_out,
> > sched_in, and wakeup (and, return to userspace, with the new
> > notifier).
>
> Ok, see the email I just sent out.
>
> And I don't think we want a new notifier - mainly because I don't
> think we want to walk the list four times (prepare, out, in, final -
> we need to make sure that these things nest properly, so even if "in"
> and "final" happen with the same state, they aren't the same, because
> "in" only happens if "out" was called, while "final" would happen if
> "prepare" was called)
>
> So it would be better to have separate lists, in order to avoid
> walking the lists four times just because there was a single notifier
> that just wanted to be called for the inner (or outer) cases.

Sounds a bit like perf events with callbacks, triggered at those places.
(allowing arbitrary permutation of the callbacks)

But ... it needs some work to shape in precisely such a way. Primarily
it would need a splitting/slimming of struct perf_event, to allow the
callback properties to be separated out for in-kernel users that are
only interested in the callbacks, not in the other abstractions.

But it looks straightforward and useful ... the kind of useful work
interested parties would be able to complete by the next merge window
;-)

Other places could use this too - we really want just one callback
facility for certain system events - be that in-kernel use for other
kernel facilities, or external instrumentation injected by user-space.

> > btw, I've been thinking we should extend concurrency managed
> > workqueues to userspace. Right now userspace can spawn a massive
> > amount of threads, hoping to hide any waiting by making more work
> > available to the scheduler. That has the drawback of increasing
> > latency due to involuntary preemption. Or userspace can use one
> > thread per cpu, hope it's the only application on the machine, and
> > go all-aio.
>
> This is what the whole next-gen AIO was supposed to do with the
> threadlets, ie avoid doing a new thread if it could do the IO all
> cached and without being preempted.

Yeah. That scheme was hobbled by signal semantics: it looked hard to do
the 'flip a reserve thread with a blocked thread' trick in the scheduler
while still keeping all the signal details in place.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/