Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID

From: Eric W. Biederman
Date: Mon Jul 16 2018 - 14:02:24 EST


Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

> On Mon, Jul 16, 2018 at 7:50 AM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>>
>> In practice since glibc does not make thread id's available I don't
>> expect anyone relies on this behavior. Since no one relies on it we
>> can change it without creating a regression.
>
> Maybe.
>
> However, possibly not.
>
> The thing is, glibc wasn't the original or only use of our threads. In
> fact, there are people out there that use clone() directly, without
> using it for posix threading. And Oleg was right to notice this,
> because the traditional model was literally to just use "kill()" on
> the pid returned from clone().

I completely agree that Oleg was right to notice this, and I was
definitely not right to overlook. In my description and otherwise.

I also think the semantic change needs to happen in it's own separate
patch so things can be tracked down.

I really don't think anyone uses this but it is not smart to hold the
rest of the changes hostage to my belief. So I am thinking about how
to rework this.

> So the semantics of Linux kill() really is to kill the thread, not the
> group leader. glibc's implementation of pthreads is not the only model
> out there.

There are two questions.
a) Can we use the pid of a thread to find the thread group?
b) Will the signal be queued in the thread group?

> Now, it is possible that at none of the legacy uses use CLONE_THREAD
> and thus aren't affected (because tgid will always be pid). So maybe
> nobody notices.

That is what I expect. I don't know think legacy is a good description.
Calling other uses of CLONE_THREAD non-glibc seems better. The old
LinuxThreads did not use CLONE_THREAD because it did not exist.
>
> But we really have three different 'kill' system calls:
>
> - the original 'kill' system call (#37 on x86-32).
>
> This looks up the thread ID, but signals the *group*.
>
> - tkill (#238)
>
> This looks up the thread, and signals the specific thread.
>
> - tgkill (#270)
>
> This looks up the tgid, and signals the group.

No. tgkill is a less racy version of tkill and verifies that the
thread it signals is in the proper thread group.

> Modern glibc will not even use the original 'kill()' at all, I think.
> But it's the legacy behavior.

No. Modern glibc definitely still uses kill. As kill is the only one
exporting the posix kill API.

> So I do think Oleg is right. We should be careful. You'll not notice
> breakage on a modern distro, but you might easily break old code.

Yes. We definitely need to be careful. At the same time since this
isn't something the old LinuxThreads had to cope with we can probably
clean it up. But as that is not my focus it should probably be pushed out.

Eric