Re: [PATCH 1/2] x86: profiling: remove lock functions hack for !FRAME_POINTER

From: Chen Zhongjin
Date: Wed Apr 12 2023 - 03:02:02 EST


On 2023/4/11 3:34, Dave Hansen wrote:
On 4/9/23 19:22, Chen Zhongjin wrote:
Syzbot has been reporting the problem of stack-out-of-bounds in
profile_pc for a long time:
https://syzkaller.appspot.com/bug?extid=84fe685c02cd112a2ac3

profile_pc tries to get pc if current regs is inside lock function. For
!CONFIG_FRAME_POINTER it used a hack way to get the pc from stack, which
is not work with ORC. It makes profile_pc read illeagal address, return
wrong result, and frequently triggers KASAN.

Since lock profiling can be handled with much better other tools, It's
reasonable to remove lock functions hack for !FRAME_POINTER kernel.
OK, so let me make sure I understand what's going on:

1. This whole issue is limited to kernel/profile.c which is what drives
readprofile(8) and /proc/profile
2. This is removing code that got added in 2006:
0cb91a229364 ("[PATCH] i386: Account spinlocks to the caller during
profiling for !FP kernels")
3. This was an OK hack back in the day, but it outright breaks today
in some situations. KASAN also didn't exist in 2006.
Yes, and whether KASAN is enabled it can make problem.
Some lock_function will save registers on stack (this may not happen in 2006).
These registers can be recorded by profile and be read outside of kernel,
which is risky theoretically.
4. !CONFIG_FRAME_POINTER is probably even more rare today than it was in
2006
No. !CONFIG_FRAME_POINTER is more common today because of UNWINDER_ORC.
And that is why the bug is triggered more frequently.
5. Lock function caller information is available at _least_ from perf,
maybe other places too?? (What "much better other tools" are there?)

Yes, it's basically about perf function graph.

Given all that, this patch suggests that we can remove the stack peeking
hack. The downside is that /proc/profile users will see their profiles
pointing to the spinlock functions like they did in 2005. The upside is
that we won't get any more KASAN reports.

If anyone complains, I assume we're just going to tell them to run 'perf
--call-graph' and to go away (which also probably didn't exist in 2006).

If I got all that right, the end result seems sane to me. It would be
_nice_ if you could make a more coherent changelog out of that and
resend. Also, considering that your two "profile" issues are quite
independent, you can probably just resend the two patches separately.
Thanks for review and I'll send another version to provide better details.