Re: [PATCH -tip v2] x86/kprobes: Drop removed INT3 handling code

From: Matthieu Baerts
Date: Sun Jan 21 2024 - 11:20:38 EST


Hi Steven,

On 21/01/2024 16:31, Steven Rostedt wrote:
> On Sun, 21 Jan 2024 16:23:35 +0100
> Matthieu Baerts <matttbe@xxxxxxxxxx> wrote:
>
>> Hi Masami, Steven,
>>
>> On 21/01/2024 10:05, Masami Hiramatsu (Google) wrote:
>>> On Sun, 21 Jan 2024 11:28:52 +0900
>>> Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx> wrote:
>>>
>>>> On Sat, 20 Jan 2024 17:05:17 -0500
>>>> Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>>>>
>>>>> On Sat, 20 Jan 2024 18:44:38 +0100
>>>>> Matthieu Baerts <matttbe@xxxxxxxxxx> wrote:
>>>>>
>>>>>>
>>>>>> I'm sorry to reply on a patch that is more than one year old, but in
>>>>>
>>>>> No problem, I've done the same.
>>>>
>>>> Yeah, thanks for reporting! I realized the problem.
>>
>> Thank you both for your quick reply, very useful explanations, analysis
>> and patch!
>>
>> (...)
>>
>>> So this another solution is already done. I think we need to add the
>>> ghost INT3 check in the exc_int3() as follows;
>>>
>>> Thank you,
>>>
>>> From add8cf7da99cdb096a0d6765b3dc5de9a3ea3019 Mon Sep 17 00:00:00 2001
>>> From: "Masami Hiramatsu (Google)" <mhiramat@xxxxxxxxxx>
>>> Date: Sun, 21 Jan 2024 17:16:50 +0900
>>> Subject: [PATCH] x86: Fixup from the removed INT3 if it is unhandled
>>>
>>> INT3 is used not only for software breakpoint, but also self modifying
>>> code on x86 in the kernel. For example, jump_label, function tracer etc.
>>> Those may not handle INT3 after removing it but not waiting for
>>> synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by
>>> anyone because they think it has been removed already.
>>> Recheck there is INT3 on the exception address and if not, ignore it.
>>>
>>> Note that previously kprobes does the same thing by itself, but that is
>>> not a good location to do that because INT3 is commonly used. Do it at
>>> the common place so that it can handle all 'ghost' INT3.
>>
>> I just tested it, and I was able to run pings for 3h without any issues!
>>
>> While at it, and just to be on the safe side, I also re-run the tests
>> after having added a "pr_warn()" -- I know, using printk(), especially
>> when talking to you... but I was not sure what was safe to use at this
>> place in the code :) -- before returning "true" in the new function you
>> added, and we can see that the crash is avoided thanks to the new code:
>>
>> [ 27.422518] traps: crash avoided, addr=18446744071882050317
>> [ 27.426182] traps: crash avoided, addr=18446744071882050317
>>
>> [ 370.483208] traps: crash avoided, addr=18446744071882075656
>> [ 370.485066] traps: crash avoided, addr=18446744071882075656
>> [ 370.485084] traps: crash avoided, addr=18446744071882075656
>>
>> [ 592.866416] traps: crash avoided, addr=18446744071882075656
>> [ 592.867937] traps: crash avoided, addr=18446744071882075656
>>
>> [ 980.988342] traps: crash avoided, addr=18446744071882050317
>> [ 980.989866] traps: crash avoided, addr=18446744071882050317
>>
>> (from my VM running with 2 CPU cores)
>>
>>
>> Again, thank you for the fix!
>>
>> (Just in case you need it:)
>>
>> Tested-by: Matthieu Baerts <matttbe@xxxxxxxxxx>
>
> The thing is, the bug is with qemu and *not* the kernel. Masami's patch
> just paper's over the real bug, and worse, if the kernel has a bug
> that's not doing proper synchronization, the patch will keep it from
> being detected. So no, I do not think this is the proper solution.
>
> The real problem is that qemu does not seem to be honoring the memory
> barriers of an interrupt. The reason the code does the ipi's is to
> force a full memory barrier across all CPUs so that they all see the
> same memory before going forward to the next step.
>
> My guess is that qemu does not treat the IPI being sent as a memory
> barrier, and then the CPUs do not see a consistent memory view after
> the IPIs are sent. That's a bug in qemu!
>
> This should be reported to the qemu community and should be fixed
> there. In the mean time, feel free to use Masami's patch in your local
> repo until qemu is fixed, but it should not be added to Linux mainline.

Thank you for the explanation!

For QEmu, I'm currently not using a recent version: v6.2.0, while the
latest one is v8.2.0. I was already suspecting that QEmu could be
responsible for this issue -- we don't have the issue with KVM, only TCG
-- but it looks like it is not that easy to upgrade it: for the CI, we
use virtme, which doesn't support newer versions. We will switch to
virtme-ng, upgrade QEmu to a version that is still supported, try to
reproduce the issue without Masami's patch, and report that to QEmu
community.

For the time being, the CI will run the tests with Masami's patch.

Cheers,
Matt
--
Sponsored by the NGI0 Core fund.