Re: [PATCH -tip v2] x86/kprobes: Drop removed INT3 handling code

From: Google
Date: Sun Jan 21 2024 - 20:17:26 EST


On Sun, 21 Jan 2024 17:20:26 +0100
Matthieu Baerts <matttbe@xxxxxxxxxx> wrote:

> Hi Steven,
>
> On 21/01/2024 16:31, Steven Rostedt wrote:
> > On Sun, 21 Jan 2024 16:23:35 +0100
> > Matthieu Baerts <matttbe@xxxxxxxxxx> wrote:
> >
> >> Hi Masami, Steven,
> >>
> >> On 21/01/2024 10:05, Masami Hiramatsu (Google) wrote:
> >>> On Sun, 21 Jan 2024 11:28:52 +0900
> >>> Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx> wrote:
> >>>
> >>>> On Sat, 20 Jan 2024 17:05:17 -0500
> >>>> Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> >>>>
> >>>>> On Sat, 20 Jan 2024 18:44:38 +0100
> >>>>> Matthieu Baerts <matttbe@xxxxxxxxxx> wrote:
> >>>>>
> >>>>>>
> >>>>>> I'm sorry to reply on a patch that is more than one year old, but in
> >>>>>
> >>>>> No problem, I've done the same.
> >>>>
> >>>> Yeah, thanks for reporting! I realized the problem.
> >>
> >> Thank you both for your quick reply, very useful explanations, analysis
> >> and patch!
> >>
> >> (...)
> >>
> >>> So this another solution is already done. I think we need to add the
> >>> ghost INT3 check in the exc_int3() as follows;
> >>>
> >>> Thank you,
> >>>
> >>> From add8cf7da99cdb096a0d6765b3dc5de9a3ea3019 Mon Sep 17 00:00:00 2001
> >>> From: "Masami Hiramatsu (Google)" <mhiramat@xxxxxxxxxx>
> >>> Date: Sun, 21 Jan 2024 17:16:50 +0900
> >>> Subject: [PATCH] x86: Fixup from the removed INT3 if it is unhandled
> >>>
> >>> INT3 is used not only for software breakpoint, but also self modifying
> >>> code on x86 in the kernel. For example, jump_label, function tracer etc.
> >>> Those may not handle INT3 after removing it but not waiting for
> >>> synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by
> >>> anyone because they think it has been removed already.
> >>> Recheck there is INT3 on the exception address and if not, ignore it.
> >>>
> >>> Note that previously kprobes does the same thing by itself, but that is
> >>> not a good location to do that because INT3 is commonly used. Do it at
> >>> the common place so that it can handle all 'ghost' INT3.
> >>
> >> I just tested it, and I was able to run pings for 3h without any issues!
> >>
> >> While at it, and just to be on the safe side, I also re-run the tests
> >> after having added a "pr_warn()" -- I know, using printk(), especially
> >> when talking to you... but I was not sure what was safe to use at this
> >> place in the code :) -- before returning "true" in the new function you
> >> added, and we can see that the crash is avoided thanks to the new code:
> >>
> >> [ 27.422518] traps: crash avoided, addr=18446744071882050317
> >> [ 27.426182] traps: crash avoided, addr=18446744071882050317
> >>
> >> [ 370.483208] traps: crash avoided, addr=18446744071882075656
> >> [ 370.485066] traps: crash avoided, addr=18446744071882075656
> >> [ 370.485084] traps: crash avoided, addr=18446744071882075656
> >>
> >> [ 592.866416] traps: crash avoided, addr=18446744071882075656
> >> [ 592.867937] traps: crash avoided, addr=18446744071882075656
> >>
> >> [ 980.988342] traps: crash avoided, addr=18446744071882050317
> >> [ 980.989866] traps: crash avoided, addr=18446744071882050317
> >>
> >> (from my VM running with 2 CPU cores)
> >>
> >>
> >> Again, thank you for the fix!
> >>
> >> (Just in case you need it:)
> >>
> >> Tested-by: Matthieu Baerts <matttbe@xxxxxxxxxx>
> >
> > The thing is, the bug is with qemu and *not* the kernel. Masami's patch
> > just paper's over the real bug, and worse, if the kernel has a bug
> > that's not doing proper synchronization, the patch will keep it from
> > being detected. So no, I do not think this is the proper solution.
> >
> > The real problem is that qemu does not seem to be honoring the memory
> > barriers of an interrupt. The reason the code does the ipi's is to
> > force a full memory barrier across all CPUs so that they all see the
> > same memory before going forward to the next step.
> >
> > My guess is that qemu does not treat the IPI being sent as a memory
> > barrier, and then the CPUs do not see a consistent memory view after
> > the IPIs are sent. That's a bug in qemu!
> >
> > This should be reported to the qemu community and should be fixed
> > there. In the mean time, feel free to use Masami's patch in your local
> > repo until qemu is fixed, but it should not be added to Linux mainline.
>
> Thank you for the explanation!
>
> For QEmu, I'm currently not using a recent version: v6.2.0, while the
> latest one is v8.2.0. I was already suspecting that QEmu could be
> responsible for this issue -- we don't have the issue with KVM, only TCG

Ah, I missed this point. If KVM works well, I agree that this is a qemu
TCG's bug. I guess TCG implementation forgets to serialize CPU when the
IPI comes.

> -- but it looks like it is not that easy to upgrade it: for the CI, we
> use virtme, which doesn't support newer versions. We will switch to
> virtme-ng, upgrade QEmu to a version that is still supported, try to
> reproduce the issue without Masami's patch, and report that to QEmu
> community.

Thanks, and if you need a reference, "Intel SDM Vol3A, 9.1.3 Handling
Self- and Cross-Modifying Code" said that what the other CPU needs to
do is "Execute serializing instruction; (* For example, CPUID
instruction *)" for cross-modifying code. that has been done in
do_sync_core(). Thus this bug should not happen.

Thank you!

>
> For the time being, the CI will run the tests with Masami's patch.
>
> Cheers,
> Matt
> --
> Sponsored by the NGI0 Core fund.


--
Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>