Re: [PATCH 0/3] signal: requeuing undeliverable signals

From: Marko Mäkelä
Date: Thu Nov 18 2021 - 01:12:39 EST


On Wed, Nov 17, 2021 at 6:51 PM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>
> Kyle Huey <me@xxxxxxxxxxxx> writes:
>
> > On Mon, Nov 15, 2021 at 9:31 PM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
> >>
> >>
> >> Kyle Huey recently reported[1] that rr gets confused if SIGKILL prevents
> >> ptrace_signal from delivering a signal, as the kernel setups up a signal
> >> frame for a signal that rr did not have a chance to observe with ptrace.
> >>
> >> In looking into it I found a couple of bugs and a quality of
> >> implementation issue.
> >>
> >> - The test for signal_group_exit should be inside the for loop in get_signal.
> >> - Signals should be requeued on the same queue they were dequeued from.
> >> - When a fatal signal is pending ptrace_signal should not return another
> >> signal for delivery.
> >>
> >> Kyle Huey has verified[2] an earlier version of this change.
> >>
> >> I have reworked things one more time to completely fix the issues
> >> raised, and to keep the code maintainable long term.
> >>
> >> I have smoke tested this code and combined with a careful review I
> >> expect this code to work fine. Kyle if you can double check that
> >> my last round of changes still works for rr I would appreciate it.
> >
> > This still fixes the race we reported.
>
> >
> > Tested-by: Kyle Huey <khuey@xxxxxxxxxxxx>
>
> Thank you very much for retesting.
>
> Eric

Thank you, Kyle and Eric, for reporting and fixing the root cause of this race.

Meanwhile, I followed Kyle's suggestion and will disable the crash
handlers in the tracee whenever it is being traced.

Marko
--
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation