Re: CONFIG_ORC_UNWINDER=y breaks get_wchan()?

From: Andy Lutomirski
Date: Mon Oct 04 2021 - 20:51:40 EST


On Tue, Sep 21, 2021, at 8:30 PM, Qi Zheng wrote:
> On 9/22/21 8:15 AM, Josh Poimboeuf wrote:
>> On Tue, Sep 21, 2021 at 12:32:49PM -0700, Vito Caputo wrote:
>>> Is this an oversight of the ORC_UNWINDER implementation? It's
>>> arguably a regression to completely break wchans for tools like `ps -o
>>> wchan` and `top`, or my window manager and its separate monitoring
>>> utility. Presumably there are other tools out there sampling wchans
>>> for monitoring as well, there's also an internal use of get_chan() in
>>> kernel/sched/fair.c for sleep profiling.
>>>
>>> I've occasionally seen when monitoring at a high sample rate (60hz) on
>>> something churny like a parallel kernel or systemd build, there's a
>>> spurious non-zero sample coming out of /proc/[pid]/wchan containing a
>>> hexadecimal address like 0xffffa9ebc181bcf8. This all smells broken,
>>> is get_wchan() occasionally spitting out random junk here kallsyms
>>> can't resolve, because get_chan() is completely ignorant of
>>> ORC_UNWINDER's effects?
>>
>> Hi Vito,
>>
>> Thanks for reporting this. Does this patch fix your issue?
>>
>> https://lkml.kernel.org/r/20210831083625.59554-1-zhengqi.arch@xxxxxxxxxxxxx
>>
>> Though, considering wchan has been silently broken for four years, I do
>> wonder what the impact would be if we were to just continue to show "0"
>> (and change frame pointers to do the same).
>
> Agree, Or remove get_wchan() directly.

I agree. wchan is a hack that may or may not do anything useful. We certainly should not be reporting things derived from the stack trace to unprivileged tasks. And it's probably just as racy as /proc/.../stack.