Re: [PATCH] signal: Allow tasks to cache one sigqueue struct (again).

From: Linus Torvalds
Date: Wed Nov 30 2022 - 06:34:28 EST


On Wed, Nov 30, 2022 at 1:58 AM Sebastian Andrzej Siewior
<bigeasy@xxxxxxxxxxxxx> wrote:
>
> That one case which was not covered by the sighand lock was clearly an
> oversight. It should have been under the sighand. The commit description
> said so.

Well, I think it was more than an oversight - it looks like it came
from basically starting from that old known-bad patch, which did the
same thing, with __sigqueue_free() intentionally called outside the
lock.

Which really is why I'm unhappy about that patch. We _know_ there were
some (possibly not entirely debugged) problems with the original
patch, and from a quick look I could find at least one remaining
oddity, and several remaining ugly cases.

Who knows - that issue with sigqueue_free() calling __sigqueue_free()
outside the lock may be what was the problem originally too, and that
caused the revert in the first place (I'm not sure we ever actually
had it ever debugged more than "this is reported to cause problems").

I suspect it would be best to not have that "do the cache case in the
normal alloc/free path" exactly because the normal alloc/free patch is
used in situations where we just don't hold that lock. Either because
of the 'sigqueue_flags' issue on the allocation path, or because - as
here - the freeing path was done separately after having dropped the
lock (to avoid lock contention, I assume).

Maybe just having the cache alloc/free be a separate function would solve it?

Or maybe that sigqueue_free() locking optimization should just be
undone, and maybe that fixes everything. Maybe that was always the
only actual bug in the original code. But I really dislike those other
games it plays too.

> You need set sigqueue_cache to NULL in copy_process().

Ack.

> The SIGQUEUE_PREALLOC signals work slightly different.

Yeah, I think that SIGQUEUE_PREALLOC hack of mine was a mistake, it
looked like a good idea in that sigqueue_free() path that I started
looking at, but it's probably a horrible idea in general.

But that "please don't use a known-broken patch as the starting point"
argument remains.

Linus