Re: [PATCH/RFC] sched: Remove SYSTEM_RUNNING checks fromcond_resched*()

From: Linus Torvalds
Date: Wed Jul 08 2009 - 12:15:06 EST




On Wed, 8 Jul 2009, Peter Zijlstra wrote:

> On Wed, 2009-07-08 at 02:50 +0200, Oleg Nesterov wrote:
>
> > /*
> > * It is valid to assume CPU-locality during early bootup:
> > */
> > if (system_state != SYSTEM_RUNNING)
> > goto out;
> >
> > this doesn't look right, smp_init() is called before we set
> > SYSTEM_RUNNING.
>
> The thing is, there's also ton's of code that might end up calling
> cond_resched() and co before the scheduler is fully initialized. Doing
> so would indeed mess things up.

I forget what triggered me to add some of them, but we definitely had bugs
with the scheduler being called early, and having some really nasty oopses
happen early in the boot (and that early, they don't get saved, so the
reporting percentage goes down to something very low).

So I think that the patch Anton posted is probably the RightThing(tm) to
do, and in that sense I like it - but they make me very nervous. There's a
lot of "cond_resched()"s sprinkled about in helper routines, and if they
are ever called early, you're basically screwed.

> Also, by definition we'd have to call smp_init() before SYSTEM_RUNNING,
> because you simply cannot declare a system up and running when your core
> functionality isn't initialized.

Now, that part I'm not sure about.

We can consider a UP system to be running, and brining up the other CPU's
to be a matter of CPU hotplug.

It's not what we do _now_, of course. We don't force hotplug support on
people just because they want SMP, and we don't want to go through the
whole "rewrite locks" things twice etc.

> So I'd really rather preserve these checks -- I can even remember
> running into some of these things a while back, but memory isn't
> providing specific cases.

Yes. I get nervous too about the patch.

That said, I do agree that maybe SYSTEM_RUNNING isn't the right check.
Testing that the scheduler is initialized may be the more correct one. I
think the SYSTEM_RUNNING one just comes from that being used for other
debug issues.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/