Re: [PATCH -tip v2] x86/kprobes: Drop removed INT3 handling code

From: Steven Rostedt
Date: Sun Jan 21 2024 - 10:31:57 EST


On Sun, 21 Jan 2024 16:23:35 +0100
Matthieu Baerts <matttbe@xxxxxxxxxx> wrote:

> Hi Masami, Steven,
>
> On 21/01/2024 10:05, Masami Hiramatsu (Google) wrote:
> > On Sun, 21 Jan 2024 11:28:52 +0900
> > Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx> wrote:
> >
> >> On Sat, 20 Jan 2024 17:05:17 -0500
> >> Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> >>
> >>> On Sat, 20 Jan 2024 18:44:38 +0100
> >>> Matthieu Baerts <matttbe@xxxxxxxxxx> wrote:
> >>>
> >>>>
> >>>> I'm sorry to reply on a patch that is more than one year old, but in
> >>>
> >>> No problem, I've done the same.
> >>
> >> Yeah, thanks for reporting! I realized the problem.
>
> Thank you both for your quick reply, very useful explanations, analysis
> and patch!
>
> (...)
>
> > So this another solution is already done. I think we need to add the
> > ghost INT3 check in the exc_int3() as follows;
> >
> > Thank you,
> >
> > From add8cf7da99cdb096a0d6765b3dc5de9a3ea3019 Mon Sep 17 00:00:00 2001
> > From: "Masami Hiramatsu (Google)" <mhiramat@xxxxxxxxxx>
> > Date: Sun, 21 Jan 2024 17:16:50 +0900
> > Subject: [PATCH] x86: Fixup from the removed INT3 if it is unhandled
> >
> > INT3 is used not only for software breakpoint, but also self modifying
> > code on x86 in the kernel. For example, jump_label, function tracer etc.
> > Those may not handle INT3 after removing it but not waiting for
> > synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by
> > anyone because they think it has been removed already.
> > Recheck there is INT3 on the exception address and if not, ignore it.
> >
> > Note that previously kprobes does the same thing by itself, but that is
> > not a good location to do that because INT3 is commonly used. Do it at
> > the common place so that it can handle all 'ghost' INT3.
>
> I just tested it, and I was able to run pings for 3h without any issues!
>
> While at it, and just to be on the safe side, I also re-run the tests
> after having added a "pr_warn()" -- I know, using printk(), especially
> when talking to you... but I was not sure what was safe to use at this
> place in the code :) -- before returning "true" in the new function you
> added, and we can see that the crash is avoided thanks to the new code:
>
> [ 27.422518] traps: crash avoided, addr=18446744071882050317
> [ 27.426182] traps: crash avoided, addr=18446744071882050317
>
> [ 370.483208] traps: crash avoided, addr=18446744071882075656
> [ 370.485066] traps: crash avoided, addr=18446744071882075656
> [ 370.485084] traps: crash avoided, addr=18446744071882075656
>
> [ 592.866416] traps: crash avoided, addr=18446744071882075656
> [ 592.867937] traps: crash avoided, addr=18446744071882075656
>
> [ 980.988342] traps: crash avoided, addr=18446744071882050317
> [ 980.989866] traps: crash avoided, addr=18446744071882050317
>
> (from my VM running with 2 CPU cores)
>
>
> Again, thank you for the fix!
>
> (Just in case you need it:)
>
> Tested-by: Matthieu Baerts <matttbe@xxxxxxxxxx>

The thing is, the bug is with qemu and *not* the kernel. Masami's patch
just paper's over the real bug, and worse, if the kernel has a bug
that's not doing proper synchronization, the patch will keep it from
being detected. So no, I do not think this is the proper solution.

The real problem is that qemu does not seem to be honoring the memory
barriers of an interrupt. The reason the code does the ipi's is to
force a full memory barrier across all CPUs so that they all see the
same memory before going forward to the next step.

My guess is that qemu does not treat the IPI being sent as a memory
barrier, and then the CPUs do not see a consistent memory view after
the IPIs are sent. That's a bug in qemu!

This should be reported to the qemu community and should be fixed
there. In the mean time, feel free to use Masami's patch in your local
repo until qemu is fixed, but it should not be added to Linux mainline.

-- Steve