Re: [PATCH 1/4] jump label - make init_kernel_text() global

From: Mathieu Desnoyers
Date: Wed Oct 07 2009 - 09:36:23 EST


* Steven Rostedt (rostedt@xxxxxxxxxxx) wrote:
> On Tue, 2009-10-06 at 22:32 -0400, Mathieu Desnoyers wrote:
>
> >
> > Hi Steven,
> >
> > OK, I'll make the explanation as straightforward as possible. I'll use a
> > race example to illustrate what we try to avoid by using the
> > breakpoint+ipi scheme. After that, I present the same scenario with the
> > breakpoint+ipi in place.
> >
> > Each step shows what is executed, and what is the memory values seen by
> > the CPU. CPU A is doing the code patching, CPU B executing the code.
> > I intentionally left out some sfence required on CPU A for simplicity.)
> >
> > Initially, let's say we have:
> > (1) (2)
> > 0xeb 0xe5 (jmp to offset 0xe5)
> >
> > And we want to change this to:
> > (1) (2)
> > 0xeb 0xf0 (jmp to offset 0xf0)
> >
> > (scenario "buggy")
> >
> > CPU A | CPU B (this is about as far as my ascii-art skills go)
> > ------------------------- ;)
> > 0xeb 0xe5 0xeb 0xe5
> > 0: CPU B instruction pointer is earlier than (1)
> > CPU B pipeline speculatively predicts branches,
> > prefetches data, calculates speculated values.
> > 1: CPU B loads 0xeb
> > 2: CPU B loads 0xe5
> > 3:
> > Write to (2)
> > 0xeb 0xf0 0xeb 0xf0
> > 4: CPU B instruction pointer gets to (1), needs to validate
> > all the pipeline speculation.
> > But ! The CPU does not expect code to change underneath.
> > General protection fault (or any other fault.. random..)
> >
> >
> > Now with the breakpoint+ipi/mb() scheme:
> > (scenario A: CPU B does not hit the breakpoint)
> >
> > CPU A | CPU B
> > -------------------------
> > 0xeb 0xe5 0xeb 0xe5
> > 0: CPU B instruction pointer is earlier than (1)
> > CPU B pipeline speculatively predicts branches,
> > prefetches data, calculates speculated values.
> > 1: CPU B loads 0xeb
> > 2: CPU B loads 0xe5
> > 3:
> > Write to (1)
> > 0xcc 0xe5 0xcc 0xe5 # breakpoint inserted
> > 4: send IPI
> > 5: mfence # serializing instruction. Flushes CPU B's
> > # pipeline
> > 6:
> > Write to (2)
> > 0xcc 0xf0 0xcc 0xf0
> > 7:
> > Write to (1)
> > 0xeb 0xf0 0xeb 0xf0
> > 8: CPU B instruction pointer gets to (1), needs to validate
> > all the pipeline speculation. Because we flushed any
> > speculation prior to the mfence, we're ok.
> >
> >
> > Now, I'll show why just using the breakpoint, without IPI, is
> > problematic:
> >
> > CPU A | CPU B
> > -------------------------
> > 0xeb 0xe5 0xeb 0xe5
> > 0: CPU B instruction pointer is earlier than (1)
> > CPU B pipeline speculatively predicts branches,
> > prefetches data, calculates speculated values.
> > 1: CPU B loads 0xeb
> > 2: CPU B loads 0xe5
> > 3:
> > Write to (1)
> > 0xcc 0xe5 0xcc 0xf0 # breakpoint inserted
> > 4:
> > Write to (2)
> > 0xcc 0xf0 0xeb 0xf0 # Silly CPU B. Did not see nor use the breakpoint.
> > # Same problem as scenario "buggy".
> > 5:
> > Write to (1)
> > 0xeb 0xf0 0xeb 0xf0
> > 4: CPU B instruction pointer gets to (1), needs to validate
> > all the pipeline speculation.
> > But ! The CPU does not expect code to change underneath.
> > General protection fault (or any other fault.. random..)
> >
> > So, basically, we ensure that the only transitions CPU B will see are
> > either:
> >
> > 0xeb 0xe5 -> 0xcc 0xe5 : OK, adding breakpoint
> > 0xcc 0xe5 -> 0xcc 0xf0 : OK, not using the operand anyway, it's a
> > breakpoint!
> > 0xcc 0xf0 -> 0xeb 0xf0 : OK, removing breakpoint
> >
> > *but*, the transition we guarantee that CPU B will *never* see without
> > having a mfence executed between the old and the new version is:
> >
> > 0xeb 0xe5 -> 0xeb 0xf0 <----- buggy.
> >
> > Hope the explanation helps,
>
> Thanks Mathieu,
>
> This does help explain a lot.
>
> So, basically the IPI is to make sure the int3 is seen by other CPUS

- I might add: and that the other CPU's instruction trace caches are
flushed with a core serializing instruction -

> before you modify the jump. Otherwise you risk setting up the int3 and
> the other CPU does not see it but still executes the change to the jmp
> destination.

Yep.

>
> I'm assuming that the int3 handler will make the process on CPU B jump
> to the next op (one not being modified).

Indeed.

>
> Now we must get from Intel and AMD that it is OK to remove the int3.

Yep, that's what hpa is trying to get them to tell us.

Thanks,

Mathieu

>
> -- Steve
>
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/