Re: [RFC PATCH v5 1/2] arm64: Introduce stack trace reliability checks in the unwinder

From: Josh Poimboeuf
Date: Tue Jun 29 2021 - 12:47:48 EST


On Thu, Jun 24, 2021 at 03:40:21PM +0100, Mark Rutland wrote:
> Hi Madhavan,
>
> On Wed, May 26, 2021 at 04:49:16PM -0500, madvenka@xxxxxxxxxxxxxxxxxxx wrote:
> > From: "Madhavan T. Venkataraman" <madvenka@xxxxxxxxxxxxxxxxxxx>
> >
> > The unwinder should check for the presence of various features and
> > conditions that can render the stack trace unreliable and mark the
> > the stack trace as unreliable for the benefit of the caller.
> >
> > Introduce the first reliability check - If a return PC is not a valid
> > kernel text address, consider the stack trace unreliable. It could be
> > some generated code.
> >
> > Other reliability checks will be added in the future.
> >
> > Signed-off-by: Madhavan T. Venkataraman <madvenka@xxxxxxxxxxxxxxxxxxx>
>
> At a high-level, I'm on-board with keeping track of this per unwind
> step, but if we do that then I want to be abel to use this during
> regular unwinds (e.g. so that we can have a backtrace idicate when a
> step is not reliable, like x86 does with '?'), and to do that we need to
> be a little more accurate.

On x86, the '?' entries don't come from the unwinder's determination of
whether a frame is reliable. (And the x86 unwinder doesn't track
reliable-ness on a per-frame basis anyway; it keeps a per-unwind global
error state.)

The stack dumping code blindly scans the stack for kernel text
addresses, in lockstep with calls to the unwinder. Any text addresses
which aren't also reported by the unwinder are prepended with '?'.

The point is two-fold:

a) failsafe in case the unwinder fails or skips a frame;

b) showing of breadcrumbs from previous execution contexts which can
help the debugging of more difficult scenarios.

--
Josh