Re: [PATCH v4] sched: automated per session task groups

From: Colin Walters
Date: Sun Dec 05 2010 - 17:47:57 EST

Next message: Jesper Juhl: "Re: [PATCH 5/8] Add yaffs2 file system: mtd and flash handlingcode"
Previous message: Andy Walls: "[PATCH for 2.6.37 REGRESSION] cx25840: Prevent device probefailure due to volume control ERANGE error"
In reply to: Linus Torvalds: "Re: [PATCH v4] sched: automated per session task groups"
Next in thread: Jesper Juhl: "Re: [PATCH v4] sched: automated per session task groups"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sun, Dec 5, 2010 at 3:47 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> The semantics of "nice" are not - and have never been - to put things
> into process scheduling groups of their own.

Again, I obviously understand that - the point is to explore the space
of changes here and consider what would (and wouldn't) break. And
actually, what would improve.

> This is very much documented. People rely on it.

Well, we established my Fedora 14 system doesn't. You said "no one"
uses "nice" interactively. So...that leaves - who? If you were
saying to me something like "I know Yahoo has some code in their data
centers which uses a range of nice values; if we made this change, all
of a sudden they'd get more CPU contention..." Or like, "I'm pretty
sure Maemo uses very low nice values for some UI code". But you so
far haven't really done that, it's just been (mostly)
assertions/handwaving. Now you obviously have a lot more experience
that gives those assertions and handwaving a lot of credibility - but
all we need is one concrete example to shut me up =)

Playing around with Google code search a bit, hits for "nice" were
almost all duplicates of various C library headers/implementations.
"setpriority" was a bit more interesting, it appears Chromium has some
code to bump up the nice value by 5 for "background" processes:

http://google.com/codesearch/p?hl=en#OAMlx_jo-ck/src/base/process_linux.cc&q=setpriority&exact_package=chromium&l=21

But all my Chrome related processes here are 0, so who knows what
that's used for. There are also hits for chromium's copy of embedded
cygwin+perl...terrifying. I assume (hope, desperately) that
Cygwin+Perl is just used for building...

Another hit here in some random X screensaver code:
http://google.com/codesearch/p?hl=en#tJJawb1IJ20/driver/exec.c&q=setpriority%20file:.*.c&l=218

But I can't find a place where it's setting a non-zero value for that.

So...ah, here's one in Android's "development" git:
http://google.com/codesearch/p?hl=en#CRBM04-7BoA/simulator/wrapsim/Init.c&q=setpriority%20file:.*.c&l=91

Except it appears to be unused =/

Oh! Here we go, one in the Android UI code:
http://google.com/codesearch/p?hl=en#uX1GffpyOZk/libs/rs/rsContext.cpp&q=setpriority%20file:.*.c&sa=N&cd=29&ct=rc
Pasting this one so people don't have to follow the link:

void * Context::threadProc(void *vrsc)
{
...
setpriority(PRIO_PROCESS, rsc->mNativeThreadId, ANDROID_PRIORITY_DISPLAY);

}

Where ANDROID_PRIORITY_DISPLAY = -4. Actually the whole enum is interesting:
http://google.com/codesearch/p?hl=en#uX1GffpyOZk/include/utils/threads.h&q=ANDROID_PRIORITY_DISPLAY&l=39

One interesting bit here is that they renice UI that the user is
presently interacting with:

/* threads currently running a UI that the user is interacting with */
ANDROID_PRIORITY_FOREGROUND = -2,

(Something "we" (and by "we" I mean GNOME) don't do, I believe Windows
does though). Though, honestly I could whip up a
gnome-settings-daemon plugin to do this in about 10 minutes. Maybe
after dinner.

So...we've established that important released operating systems do
use negative nice values (not surprising). I can't offhand find any
uses of e.g. ANDROID_PRIORITY_BACKGROUND (i.e. a positive nice value)
in the "base" sources though.

> Different nice levels shouldn't get group scheduled together - they
> should be scheduled *less*.

But it seems obvious (right?) that putting them in one group *will*
ensure they get scheduled less, since that one group has to contend
with all other processes.

> And it's not about "make", since nobody
> really ever uses nice on make anyway, it's about things like
> pulseaudio (that wants higher priorities)

Note that pulse is actually using the RT scheduling class, so (I
think) its actual nice value is irrelevant.

Again using F14, the only things using negative nice besides pulse is
udev and auditd.

> Not very much (because they are mostly useless), but there really are
> people who use it.

Still trying to extract specific examples of "people who use it" from you...

> Do you *really* think that the person who niced the filesystem indexer
> down wants the indexer to get 50% of the CPU, just because it's
> scheduled separately from the parallel make?

Finally, an example! I can work with this. So let's assume I'm using
some JavaScript-intensive website in Firefox in GNOME, and
tracker-miner-fs kicks in after noticing I just saved a Word document
I want to look at later. And an otherwise idle system. You're
suggesting that, now tracker-miner-fs would be using a lot more CPU if
it was in an empty group than it would have before?

That does seem likely to be true. But would it be a *problem*? I
don't know, it's not obvious to me offhand. Especially on any
hardware that's dual-core, where SpiderMonkey can be burning one core
(since that's all it will use, modulo Web Workers), and tracker on
another.

Anyways, I don't have the kernel-fu to make a patch myself here,
especially since the scheduler is probably one of the hardest parts of
the OS. So ultimately I guess, if you just totally disagree, fine.
But I wasn't satisfied with the response - my engineering intuition is
to work through problems and try to really understand what would be
wrong. It's hard to accept "just trust me, that's stupid".
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Jesper Juhl: "Re: [PATCH 5/8] Add yaffs2 file system: mtd and flash handlingcode"
Previous message: Andy Walls: "[PATCH for 2.6.37 REGRESSION] cx25840: Prevent device probefailure due to volume control ERANGE error"
In reply to: Linus Torvalds: "Re: [PATCH v4] sched: automated per session task groups"
Next in thread: Jesper Juhl: "Re: [PATCH v4] sched: automated per session task groups"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]