Re: [GIT pull] sched/core for v5.16-rc1

From: Linus Torvalds
Date: Wed Nov 03 2021 - 12:23:35 EST


On Tue, Nov 2, 2021 at 1:41 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> > It could be a user process doing bad things to the user stack frame
> > from another thread when profiling is enabled.
>
> Most of the unwinders seem to only care about the kernel stack. Not the
> user stack.

Note that it very much happens for a kernel stack too.

There the reason isn't some active attack, but simply stack
corruption, or - not uncommonly - missing or incomplete debug notes
that the unwinder crazily depends on.

If an unwinder isn't robust enough to deal with stack corruption, it
damn well should be deleted immediately - it will only cause even
*more* problems when some nasty bug happens, and suddenly the unwinder
means that you don';t get a proper oops report.

And yes, I feel strongly about this, because we very much used to have
that situation on x86 too a long time ago. I spent a year fighting
buggy unwinders, and then removed the unbelievable garbage in the end
because the maintainer of said thing refused to admit that there was a
problem.

So I really think that the solution to "unwinder is not robust" is
absolutely not to take more locks. Because that's literally just
hiding the much bigger and serious problem.

The fact that the lock in question is a fairly critical one (and needs
to use "raw_spin_lock()" and friends) is just another argument against
it.

I've obviously pulled this on Monday already, and I'm not going to
start reverting those commits unless they cause problems, but I do
think they were seriously misguided.

Linus