Re: [PATCH] ftrace: Fix NULL pointer dereference in is_ftrace_trampoline when ftrace is dead

From: Yang Jihong
Date: Wed Aug 17 2022 - 22:32:25 EST


Hello,

On 2022/8/18 10:14, Steven Rostedt wrote:
On Thu, 18 Aug 2022 09:50:40 +0800
Yang Jihong <yangjihong1@xxxxxxxxxx> wrote:

Thanks for the detailed explanation.
If panic_on_warn is not set, FTRACE_WARN_ON{_ONCE} only sets
ftrace_disabled, but will not reboot.

Correct. But whenever there's a WARN_ON() the administrator of the machine
should think about rebooting it ASAP. That's because all WARN_ON()s are
suppose to only happen when the system does something that was not
expected, putting it into an inconsistent state. And could be a dangerous
one. This is why all WARN_ON()s that are triggered are considered bugs and
must be fixed.


I think this is to limit the problem to ftrace itself and not spread to
other subsystems(I don't know if that's right. If it's not right, please
correct it).

Yes, the ftrace_disable means that ftrace just found itself in a situation
that it does not understand, and nothing can be trusted. As ftrace modifies
kernel code, it basically stops everything and WARNs about it. Because
anything else it does can make things worse.

Because is_ftrace_trampoline is a common and public interface (This
interface is called in many places in the kernel).
If is_ftrace_trampoline interface is not restricted (for example, just
return true if ftrace_disabled is set), the preceding Syzkaller scenario
may be triggered when this interface is called.

If ftrace_disabled is set, then any operations should fail, and any tests
should fail with it.


Therefore, my idea is to restrict the is_ftrace_trampoline or roll back
_unregister_ftrace_function when ftrace_disabled is set, so that the
interface can be invoked normally. Or keep the current code and do not
modify.

Once ftrace_disabled is set, none of its interfaces should perform
normally.

But you reported that you could hit a NULL pointer from the
is_ftrace_trampoline() which was caused by the failure adding the dynamic
trampoline, and then the ops is on the list but later freed.

My suggestion above is to just call _unregister_ftrace_function(ops) to
take it off the list and prevent the NULL pointer.

Doesn't that fix the bug?

I don't want to totally roll it back and free the trampoline, because those
actions could cause further damage, depending on the failed state ftrace is
in.
OK, I understand, and will be modified in this way in next version.

Thanks,
Yang