Re: Test 73 Sig_trap fails on arm64 (was Re: [PATCH] perf test: Test 73 Sig_trap fails on s390)

From: Will Deacon
Date: Tue Feb 15 2022 - 09:35:12 EST


On Tue, Feb 15, 2022 at 11:16:16AM +0000, John Garry wrote:
> On 24/01/2022 09:19, John Garry wrote:
>
> Hi Will,
>
> Have you had a chance to check this or the mail from Dmitry?
>
> https://lore.kernel.org/linux-perf-users/CACT4Y+YVyJcqbR5j2fsSQ+C5hy78X+aobrUHaZKghFf0_NMv=A@xxxxxxxxxxxxxx/
>
> If we can't make progress then we just need to skip the test on arm64 for
> now.

Sorry, I haven't had time to look at this (or the thousands of other mails
in my inbox) lately.

I don't recall all of the details, but basically hw_breakpoint really
doesn't work well on arm/arm64 -- the sticking points are around handling
the stepping and whether to step into or over exceptions. Sadly, our ptrace
interface (which is what is used by GDB) is built on top of hw_breakpoint,
so we can't just rip it out and any significant changes are pretty risky.

What I would like to happen is that we rework our debug exception handling
as outlined by [1] so that kernel debug is better defined and the ptrace
interface can interact directly with the debug architecture instead of being
funnelled through hw_breakpoint. Once we have that, I think we could try to
improve hw_breakpoint much more comfortably (or at least defeature it
considerably without having to worry about breaking GDB). I started this a
couple of years ago, but I haven't found time to get back to it for ages.

Anyway, to this specific test...

When we hit a break/watchpoint the faulting PC points at the instruction
which faulted and the exception is reported before the instruction has had
any other side-effects (e.g. if a watchpoint triggers on a store, then
memory will not have been updated when the watchpoint handler runs), so if
we were to return as usual after reporting the exception to perf then we
would just hit the same break/watchpoint again and we'd get stuck. GDB
handles stepping over the faulting instruction, but for perf (and assumedly
these tests), the kernel is expected to handle the step. This handling
amounts to disabling the break/watchpoint which we think we hit and then
attempting a hardware single-step. During the step we could run into more
break/watchpoints on the same instruction, so we'll keep disabling things
until we eventually manage to complete the step, which is signalled by a
specific type of debug exception. At this point, we re-enable the
break/watchpoints and we're good.

Signals make this messy, as the step logic will step _into_ the signal
handler -- we have to do this, otherwise we would miss break/watchpoints
triggered by the signal handler if we had disabled them for the step.
However, it means that when we return back from the signal handler we will
run back into the break/watchpoint which we initially stepped over. When
perf uses SIGTRAP to notify userspace that we hit a break/watchpoint,
then we'll get stuck because we'll step into the handler every time.

Hopefully that clears things up a bit. Ideally, the kernel wouldn't
pretend to handle this stepping at all for arm64 as it adds a bunch of
complexity, overhead to our context-switch and I don't think the current
behaviour is particularly useful.

Will

[1] https://lore.kernel.org/all/20200626095551.GA9312@willie-the-truck/