Re: SCO: "thread creation is about a thousand times faster than on native Linux"

From: Andi Kleen (ak@suse.de)
Date: Thu Aug 24 2000 - 07:01:55 EST

Next message: Daniel Stone: "[BLARG] Re:"
Previous message: jsvec@sitel.cz: "(no subject)"
In reply to: Linus Torvalds: "Re: SCO: "thread creation is about a thousand times faster than on native Linux""
Next in thread: Stephen C. Tweedie: "Re: SCO: "thread creation is about a thousand times faster than on native Linux""
Reply: Stephen C. Tweedie: "Re: SCO: "thread creation is about a thousand times faster than on native Linux""
Reply: Linus Torvalds: "Re: SCO: "thread creation is about a thousand times faster than on native Linux""
Reply: yodaiken@fsmlabs.com: "Re: SCO: "thread creation is about a thousand times faster than on native Linux""
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Aug 23, 2000 at 09:54:55PM -0700, Linus Torvalds wrote:
>
>
> On Thu, 24 Aug 2000, Albert D. Cahalan wrote:
> >
> > Nobody will send you a sane patch without you at least hinting
> > at what you might like to see. I'm sure many of us would be happy
> > to write the code, but not under the expectation that it will
> > be rejected.
>
> Acceptable solution:
> - add "tgid" (thread group ID) to "struct task_struct"
> - CLONE_PID will leave tgid unchanged.
> - non-CLONE_PID will set "tgid" to the same as pid
> - get_pid() checks that the new pid is not the tgid of any process.
>
> Basically, the above creates a new "session pid" for the collection of
> threads. This is nothing new: the above is basially _exactly_ the same as
> p->pgrp and p->session, so it fits quite well into the whole pid notion.
>
> It also means that "current->pid" basically becomes a traditional "thread
> ID", while "current->tgid" effectively becomes what pthreads calls a
> "pid". Except Linux does it the right way around, ie the same way we've
> done sessions and process groups. Because, after all, this _is_ just a
> process group extension.
>
> Now, once you have a "tgid" for each process, you can add system calls to
> - sys_gettgid(): get the thread ID
> - sys_tgkill(): do a pthreads-like "send signal to thread group" (or
> extend on the current sys_kill())

Wouldn't it make more sense to extend the current process group concept ?
A process could be in two groups, the thread group and the process group
with the pid of the session group leader.
You could just extend the current kill() "kill group on negative value"
semantics then by passing in -tid (this assumes tid does not have collisions
in the pid/tid space, but you seem to want to already enforce that)

Ugly part is that you would have two group ids per thread already. Do you
want just two or do you want N?

>
> Now, the problem is that the thread group kill thing for true POSIX
> threads signal behaviour probably has to do some strange magic to get the
> pthreads signal semantics right. I don't even know the exact details here,
> so somebody who _really_ knows pthreads needs to look long and hard at
> this (efficiency here may require that we have a circular list of each
> "thread ID group" - ie that we add the proper process pointer list that
> gets updated at fork() and exit() so that we can easily walk every process
> in the process group list).

POSIX wants to send the signal to the first thread in the group who
doesn't have it blocked.

Several signals are special cased in POSIX, e.g. SIGSTOP, and need to
handled by all threads in the group.

>
> Discussion welcome. Basically, it needs somebody who knows pthreads well,
> but actually has good taste despite that fact. Such people seem to be in
> short supply ;)
>
>
Here is my braindump. I would appreciate any comments.

For good behaviour you need a shared sigprocmask(). (I just ran into a
situation where shared signal blocking would have been very useful on Linux).
You basically want to protect your data structures that could be accessed
by signals against signals send to any thread, otherwise sigprocmask
are pretty useless.
->blocked etc. probably would need to move into a shared struct. Moving
it into signal_struct would have the problem that it would break programs
who depend on the old Linux signals semantics, so it would either need
to be new counted separate structure or another level of indirection in
task_struct [pointers that either point to a local field in task_struct or
into the shared signal_struct].

To complicate it the Single Unix spec is vague here. sigprocmask in a
multithreaded process is undefined. I think it only makes sense to have
it shared for all threads in the group, otherwise you simply cannot use
it for locking. There also a pthread_sigmask() which only manipulates
the signal masks of the local thread. Result is you need a thread local
signal mask, and an optional shared signal mask that points to the shared
signal struct.

Some programs want waitpid() to return something consistent when the last
thread goes away. In your case that would be the tid, that could be set
e.g. via prctl similar to PR_DEATHSIG.

On the topic of waitpid: One reason why LinuxThreads uses that wasteful
ThreadManager-does-the-clone construction currently is that there is no
easy way to redirect the waitpid notification to arbitary processes.
LinuxThreads needs to see thread deaths though and not miss them when
the creating thread died earlier. With tids it would be best if waitpid()
could be told (e.g. again via prctl) to just notify any process in a tid
group, preferably with some ordering [check first if any thread is hanging
in waitpid(), if yes notify, if not chose the first which does not have
the death signal blocked]

[Earlier there were proposals to add a CLONE_WAITPID for that, but I think
controlling it via the tid and prctl would be more elegant and flexible]

Another thing would be shared credentials. I'm sure there are portd
programs who have security bugs on Linux because they expect setuid() to be
process global, and it is local. Unfortunately that's more ugly to get right,
you would need separate reference counted credentials structures to get
atomic behaviour for system calls (they cannot see half changed credentials
or eat credentials changes after sleeping).

To solve the problem of system management tools (top) etc. counting a single
shared mm_struct multiple times [threaded staroffice looks really funny in
gtop] I proposed earlier to add a vmid to /proc. The vmid would just be
the current->mm pointer [i do not thing it is worth to add another pid
like space for it, that would be bloat, a "cookie" like the pointer is enough]
and could be used by the programs to notice shared vm. The tid could be used i
in theory for that too, but it would break again for top when someone wants s
hared vm without shared tid, so I think having separate vmid is better.

When you agree to the vmid concept I'll send you a patch for it.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Next message: Daniel Stone: "[BLARG] Re:"
Previous message: jsvec@sitel.cz: "(no subject)"
In reply to: Linus Torvalds: "Re: SCO: "thread creation is about a thousand times faster than on native Linux""
Next in thread: Stephen C. Tweedie: "Re: SCO: "thread creation is about a thousand times faster than on native Linux""
Reply: Stephen C. Tweedie: "Re: SCO: "thread creation is about a thousand times faster than on native Linux""
Reply: Linus Torvalds: "Re: SCO: "thread creation is about a thousand times faster than on native Linux""
Reply: yodaiken@fsmlabs.com: "Re: SCO: "thread creation is about a thousand times faster than on native Linux""
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Thu Aug 31 2000 - 21:00:13 EST