Re: Test 73 Sig_trap fails on arm64 (was Re: [PATCH] perf test: Test 73 Sig_trap fails on s390)

From: Dmitry Vyukov
Date: Wed Feb 16 2022 - 06:54:33 EST


On Wed, 16 Feb 2022 at 12:47, John Garry <john.garry@xxxxxxxxxx> wrote:
>
> Hi Will,
>
> > Sorry, I haven't had time to look at this (or the thousands of other mails
> > in my inbox) lately.
> >
>
> Thanks
>
> > I don't recall all of the details, but basically hw_breakpoint really
> > doesn't work well on arm/arm64 -- the sticking points are around handling
> > the stepping and whether to step into or over exceptions. Sadly, our ptrace
> > interface (which is what is used by GDB) is built on top of hw_breakpoint,
> > so we can't just rip it out and any significant changes are pretty risky.
> >
> > What I would like to happen is that we rework our debug exception handling
> > as outlined by [1] so that kernel debug is better defined and the ptrace
> > interface can interact directly with the debug architecture instead of being
> > funnelled through hw_breakpoint. Once we have that, I think we could try to
> > improve hw_breakpoint much more comfortably (or at least defeature it
> > considerably without having to worry about breaking GDB). I started this a
> > couple of years ago, but I haven't found time to get back to it for ages.
> >
> > Anyway, to this specific test...
> >
> > When we hit a break/watchpoint the faulting PC points at the instruction
> > which faulted and the exception is reported before the instruction has had
> > any other side-effects (e.g. if a watchpoint triggers on a store, then
> > memory will not have been updated when the watchpoint handler runs), so if
> > we were to return as usual after reporting the exception to perf then we
> > would just hit the same break/watchpoint again and we'd get stuck. GDB
> > handles stepping over the faulting instruction, but for perf (and assumedly
> > these tests), the kernel is expected to handle the step. This handling
> > amounts to disabling the break/watchpoint which we think we hit and then
> > attempting a hardware single-step. During the step we could run into more
> > break/watchpoints on the same instruction, so we'll keep disabling things
> > until we eventually manage to complete the step, which is signalled by a
> > specific type of debug exception. At this point, we re-enable the
> > break/watchpoints and we're good.
> >
> > Signals make this messy, as the step logic will step_into_ the signal
> > handler -- we have to do this, otherwise we would miss break/watchpoints
> > triggered by the signal handler if we had disabled them for the step.
> > However, it means that when we return back from the signal handler we will
> > run back into the break/watchpoint which we initially stepped over. When
> > perf uses SIGTRAP to notify userspace that we hit a break/watchpoint,
> > then we'll get stuck because we'll step into the handler every time.
> >
> > Hopefully that clears things up a bit. Ideally, the kernel wouldn't
> > pretend to handle this stepping at all for arm64 as it adds a bunch of
> > complexity, overhead to our context-switch and I don't think the current
> > behaviour is particularly useful.
> >
>
> Right, so what I am hearing altogether is that for now we should just
> skip this test.
>
> And since the kernel does not seem to advertise this capability we need
> to disable for specific architectures.

It does and fwiw I am just trying to use it. Things work only on x86 so far.