Re: [announce] [patch] ultra-scalable O(1) SMP and UP scheduler

From: Ingo Molnar (
Date: Sat Jan 05 2002 - 22:41:05 EST

On Sat, 5 Jan 2002, Linus Torvalds wrote:

> At that point I don't think O(1) matters all that much, but it
> certainly won't hurt. UNLESS it causes bad choices to be made. Which
> we can only guess at right now.

i have an escape path for the MM-goodness issue: we can make it
O(nr_running_threads) if need to be, by registering MM users into a per-MM
'MM runqueue', and scanning this runqueue for potentially better
candidates if it's non-empty. In the process case this falls back to O(1),
in the threaded case it would be scanning the whole (local) runqueue in

And George Anzinger has a nice idea to help those platforms which have
slow bitsearch functions, we can keep a floating pointer of the highest
priority queue which can be made NULL if the last task from a priority
level was used up or can be increased if a higher priority task is added,
this pointer will be correct in most of the time, and we can fall back to
the bitsearch if it's NULL.

so while i think that the O(1) design indeed stretches things a bit and
reduces our moving space, it's i think worth a try. Switching an existing
per-CPU queue design for a full-search design shouldnt be too hard. The
load-balancing parts wont be lost whichever path we chose later on, and
that is the most important bit i think.

> Just out of interest, where have the bugs crept up? I think we could
> just try out the thing and see what's up, but I know that at least
> some versions of bash are buggy and _will_ show problems due to the
> "run child first" behaviour. Remember: we actually tried that for a
> while in 2.4.x.

i've disabled the 'run child first' behavior in the latest patch at:

the crash bug i've been hunting all day, but i cannot reproduce it which
appears to show it's either something really subtle or something really
stupid. But the patch certainly does not have any of the scarier
strace/ptrace related bug scenarios or 'two CPUs run the same task' bugs,
i've put lots of testing into that part. The only crash remaining in -B1
is a clear NULL pointer dereference in wakeup(), which is a direct bug not
some side-effect, i hope to be able to find it soon.

right now there are a number of very helpful people who are experiencing
those crashes and are ready to run bzImages i compile for them (so that
compiler and build environment is out of the picture) - Pawel Kot for
example. (thanks Pawel!)

> [ In 2.5.x it's fine to break broken programs, though, so this isn't that
> much of an issue any more. From the reports I've seen the thing has
> shown more bugs than that, though. I'd happily test a buggy scheduler,
> but I don't want to mix bio problems _and_ scheduler problems, so I'm
> not ready to switch over until either the scheduler patch looks stable,
> or the bio stuff has finalized more. ]

i've disabled it to reduce the number of variables - we can still try and
enable it later on once things have proven to be stable.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

This archive was generated by hypermail 2b29 : Mon Jan 07 2002 - 21:00:30 EST