Re: [PATCH -tip v2] x86/kprobes: Drop removed INT3 handling code

From: Matthieu Baerts
Date: Sun Jan 21 2024 - 10:23:52 EST


Hi Masami, Steven,

On 21/01/2024 10:05, Masami Hiramatsu (Google) wrote:
> On Sun, 21 Jan 2024 11:28:52 +0900
> Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx> wrote:
>
>> On Sat, 20 Jan 2024 17:05:17 -0500
>> Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>>
>>> On Sat, 20 Jan 2024 18:44:38 +0100
>>> Matthieu Baerts <matttbe@xxxxxxxxxx> wrote:
>>>
>>>>
>>>> I'm sorry to reply on a patch that is more than one year old, but in
>>>
>>> No problem, I've done the same.
>>
>> Yeah, thanks for reporting! I realized the problem.

Thank you both for your quick reply, very useful explanations, analysis
and patch!

(...)

> So this another solution is already done. I think we need to add the
> ghost INT3 check in the exc_int3() as follows;
>
> Thank you,
>
> From add8cf7da99cdb096a0d6765b3dc5de9a3ea3019 Mon Sep 17 00:00:00 2001
> From: "Masami Hiramatsu (Google)" <mhiramat@xxxxxxxxxx>
> Date: Sun, 21 Jan 2024 17:16:50 +0900
> Subject: [PATCH] x86: Fixup from the removed INT3 if it is unhandled
>
> INT3 is used not only for software breakpoint, but also self modifying
> code on x86 in the kernel. For example, jump_label, function tracer etc.
> Those may not handle INT3 after removing it but not waiting for
> synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by
> anyone because they think it has been removed already.
> Recheck there is INT3 on the exception address and if not, ignore it.
>
> Note that previously kprobes does the same thing by itself, but that is
> not a good location to do that because INT3 is commonly used. Do it at
> the common place so that it can handle all 'ghost' INT3.

I just tested it, and I was able to run pings for 3h without any issues!

While at it, and just to be on the safe side, I also re-run the tests
after having added a "pr_warn()" -- I know, using printk(), especially
when talking to you... but I was not sure what was safe to use at this
place in the code :) -- before returning "true" in the new function you
added, and we can see that the crash is avoided thanks to the new code:

[ 27.422518] traps: crash avoided, addr=18446744071882050317
[ 27.426182] traps: crash avoided, addr=18446744071882050317

[ 370.483208] traps: crash avoided, addr=18446744071882075656
[ 370.485066] traps: crash avoided, addr=18446744071882075656
[ 370.485084] traps: crash avoided, addr=18446744071882075656

[ 592.866416] traps: crash avoided, addr=18446744071882075656
[ 592.867937] traps: crash avoided, addr=18446744071882075656

[ 980.988342] traps: crash avoided, addr=18446744071882050317
[ 980.989866] traps: crash avoided, addr=18446744071882050317

(from my VM running with 2 CPU cores)


Again, thank you for the fix!

(Just in case you need it:)

Tested-by: Matthieu Baerts <matttbe@xxxxxxxxxx>

Cheers,
Matt
--
Sponsored by the NGI0 Core fund.