Re: [GIT pull] sched/core for v5.16-rc1

From: Mark Rutland
Date: Wed Nov 03 2021 - 09:53:37 EST


On Tue, Nov 02, 2021 at 09:41:26AM +0100, Peter Zijlstra wrote:
> On Mon, Nov 01, 2021 at 02:27:49PM -0700, Linus Torvalds wrote:
> > On Mon, Nov 1, 2021 at 2:01 PM Linus Torvalds
> > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > Unwinders that need locks because they can do bad things if they are
> > > working on unstable data are EVIL and WRONG.
> >
> > Note that this is fundamental: if you can fool an unwider to do
> > something bad just because the data isn't stable, then the unwinder is
> > truly horrendously buggy, and not usable.
>
> From what I've been led to believe, quite a few of our arch unwinders
> seem to fall in that category. They're mostly only happy when unwinding
> self and don't have many guardrails on otherwise.
>
> > It could be a user process doing bad things to the user stack frame
> > from another thread when profiling is enabled.
>
> Most of the unwinders seem to only care about the kernel stack. Not the
> user stack.

Yup; there are usually separate unwinders for user/kernel, since there
are different constaints (and potentially different ABIs for unwinding).

> > It could be debug code unwinding without locks for random reasons.
> >
> > So I really don't like "take a lock for unwinding". It's a pretty bad
> > bug if the lock required.
>
> Fair enough; te x86 unwinder is pretty robust in this regard, but it
> seems to be one of few :/

FWIW, the arm64 kernel unwinder also shouldn't blow up (so long as the
target stack is pinned via try_get_stack() or similar).

However, depending on how the task reuses the stack, the results can be
entirely bogus rather than just stale, since data on the stack can look
like a kernel pointer (even if that's fairly unllikely). I'm happy to
believe that we don't care aobut that for wchan, but it's not something
I'd like to see spread.

> > The "Link" in the commit also is entirely useless, pointing back to
> > the emailed submission of the patch, rather than any useful discussion
> > about why the patch happened.
>
> So the initial discussion started here:
>
> https://lkml.kernel.org/r/20210923233105.4045080-1-keescook@xxxxxxxxxxxx
>
> A later thread that might also be of interest is:
>
> https://lkml.kernel.org/r/YWgyy+KvNLQ7eMIV@xxxxxxxxxxxxxxxxxxxxx
>
> Also, an even later thread proposes to push that lock into more stack
> unwinding functions (anything doing remote unwinds):
>
> https://lkml.kernel.org/r/20211022150933.883959987@xxxxxxxxxxxxx
>
> But it seems to be you're thinking that's fundamentally buggered and
> people should instead invest in fixing their unwinders already.
>
> Now, as is, this stuff is user exposed through /proc/$pid/{wchan,stack}
> and as such I think it *can* do with a few extra guardrails in generic
> code. OTOH, /proc/$pid/stack is root only.
>
> Also, the remote stack-trace code is hooked into bpf (because
> kitchen-sink) and while I didn't look too hard, I can imagine it could
> be used to trigger crashes on our less robust architectures if prodded
> just right.

I do worry that remote unwinds from BPF are just silently generating
junk, but it's not clear to me what they're actually used for and how
much that matters. I don't understand why a remote unwind is necessary
at all.

> Should I care about all this from a generic code PoV, or simply let the
> architectures that got it 'wrong' deal with it?

FWIW I'm happy either way. There are some upcoming improvements to the
arm64 unwinder that currently conflict and I need to know whether to
wait and rebase or assume that we take those first.

Thanks,
Mark.