Re: [PATCH] perf: Fix missing SIGTRAPs due to pending_disable abuse

From: Peter Zijlstra
Date: Tue Oct 04 2022 - 13:22:15 EST


On Tue, Oct 04, 2022 at 07:09:15PM +0200, Peter Zijlstra wrote:
> On Wed, Sep 28, 2022 at 04:55:46PM +0200, Marco Elver wrote:
> > On Wed, Sep 28, 2022 at 12:06PM +0200, Marco Elver wrote:
> >
> > > My second idea about introducing something like irq_work_raw_sync().
> > > Maybe it's not that crazy if it is actually safe. I expect this case
> > > where we need the irq_work_raw_sync() to be very very rare.
> >
> > The previous irq_work_raw_sync() forgot about irq_work_queue_on(). Alas,
> > I might still be missing something obvious, because "it's never that
> > easy". ;-)
> >
> > And for completeness, the full perf patch of what it would look like
> > together with irq_work_raw_sync() (consider it v1.5). It's already
> > survived some shorter stress tests and fuzzing.
>
> So.... I don't like it. But I cooked up the below, which _almost_ works :-/
>
> For some raisin it sometimes fails with 14999 out of 15000 events
> delivered and I've not yet figured out where it goes sideways. I'm
> currently thinking it's that sigtrap clear on OFF.

Oh Urgh, this is ofcourse the case where an IPI races with a migration
and we loose the race with return to use. Effectively giving the signal
skid vs the hardware event.

Bah.. I really hate having one CPU wait for another... Let me see if I
can find another way to close that hole.