Re: [PATCH 1/4] jump label - make init_kernel_text() global

From: Masami Hiramatsu
Date: Tue Oct 06 2009 - 23:09:54 EST

Next message: Mark Lord: "Re: sata_mv 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040"
Previous message: Paul Mackerras: "Re: [PATCH 1/2] tracing: Check total refcount before releasing bufs in profile_enable failure"
In reply to: Mathieu Desnoyers: "Re: [PATCH 1/4] jump label - make init_kernel_text() global"
Next in thread: Mathieu Desnoyers: "Re: [PATCH 1/4] jump label - make init_kernel_text() global"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Mathieu Desnoyers wrote:
> * Steven Rostedt (rostedt@xxxxxxxxxxx) wrote:
>> On Sat, 2009-10-03 at 08:39 -0400, Mathieu Desnoyers wrote:
>>
>>> I might be missing a bit of context here, I just want to make sure we
>>> are on the same page: patching a jmp instruction is safe on UP, safe
>>> with stop_machine(), is very likely safe with the breakpoint-ipi
>>
>> Hi Mathieu,
>>
>> I've been reading through these threads (both this one and the immediate
>> one) and I'm still a bit confused. I really want to understand this in a
>> simple way, thus make sure everyone else understands it too.
>>
>> >From what Arjan said here:
>>
>> http://lkml.org/lkml/2009/9/25/98
>>
>> The issue is going back from the int3 to the old value. How does the
>> breakpoint-ipi work?
>>
>> Supposedly, we can add an int3 to the code without any worry. If another
>> CPU at that same time hits that code path, it will either run the old
>> code, or take the interrupt. The breakpoint interrupt handler, will
>> handle that code path, and the execution continues.
>>
>> Now what is the issues with removing the int3 and placing back the old
>> (or new) value. Is there an issue if another CPU is about to execute
>> that code path as we remove the int3? If so, how does sending an IPI
>> help the matter without adding more races?
>>
>> Is there only an issue if we change the old value with something else,
>> and you just need to send the IPI after you modify the old code and
>> before removing the int3?
>>
>> I may just be totally confused, which I usually am. But when I'm not
>> confused, I feel that the code is practical ;-)
>>
>
> Hi Steven,
>
> OK, I'll make the explanation as straightforward as possible. I'll use a
> race example to illustrate what we try to avoid by using the
> breakpoint+ipi scheme. After that, I present the same scenario with the
> breakpoint+ipi in place.
>
> Each step shows what is executed, and what is the memory values seen by
> the CPU. CPU A is doing the code patching, CPU B executing the code.
> I intentionally left out some sfence required on CPU A for simplicity.)
>
> Initially, let's say we have:
> (1) (2)
> 0xeb 0xe5 (jmp to offset 0xe5)
>
> And we want to change this to:
> (1) (2)
> 0xeb 0xf0 (jmp to offset 0xf0)
>
> (scenario "buggy")
>
> CPU A | CPU B (this is about as far as my ascii-art skills go)
> ------------------------- ;)
> 0xeb 0xe5 0xeb 0xe5
> 0: CPU B instruction pointer is earlier than (1)
> CPU B pipeline speculatively predicts branches,
> prefetches data, calculates speculated values.
> 1: CPU B loads 0xeb
> 2: CPU B loads 0xe5
> 3:
> Write to (2)
> 0xeb 0xf0 0xeb 0xf0
> 4: CPU B instruction pointer gets to (1), needs to validate
> all the pipeline speculation.
> But ! The CPU does not expect code to change underneath.
> General protection fault (or any other fault.. random..)
>
>
> Now with the breakpoint+ipi/mb() scheme:
> (scenario A: CPU B does not hit the breakpoint)
>
> CPU A | CPU B
> -------------------------
> 0xeb 0xe5 0xeb 0xe5
> 0: CPU B instruction pointer is earlier than (1)
> CPU B pipeline speculatively predicts branches,
> prefetches data, calculates speculated values.
> 1: CPU B loads 0xeb
> 2: CPU B loads 0xe5
> 3:
> Write to (1)
> 0xcc 0xe5 0xcc 0xe5 # breakpoint inserted
> 4: send IPI
> 5: mfence # serializing instruction. Flushes CPU B's
> # pipeline
> 6:
> Write to (2)
> 0xcc 0xf0 0xcc 0xf0
> 7:
> Write to (1)
> 0xeb 0xf0 0xeb 0xf0
> 8: CPU B instruction pointer gets to (1), needs to validate
> all the pipeline speculation. Because we flushed any
> speculation prior to the mfence, we're ok.
>
>
> Now, I'll show why just using the breakpoint, without IPI, is
> problematic:
>
> CPU A | CPU B
> -------------------------
> 0xeb 0xe5 0xeb 0xe5
> 0: CPU B instruction pointer is earlier than (1)
> CPU B pipeline speculatively predicts branches,
> prefetches data, calculates speculated values.
> 1: CPU B loads 0xeb
> 2: CPU B loads 0xe5
> 3:
> Write to (1)
> 0xcc 0xe5 0xcc 0xf0 # breakpoint inserted
> 4:
> Write to (2)
> 0xcc 0xf0 0xeb 0xf0 # Silly CPU B. Did not see nor use the breakpoint.
> # Same problem as scenario "buggy".
> 5:
> Write to (1)
> 0xeb 0xf0 0xeb 0xf0
> 4: CPU B instruction pointer gets to (1), needs to validate
> all the pipeline speculation.
> But ! The CPU does not expect code to change underneath.
> General protection fault (or any other fault.. random..)
>
> So, basically, we ensure that the only transitions CPU B will see are
> either:
>
> 0xeb 0xe5 -> 0xcc 0xe5 : OK, adding breakpoint
> 0xcc 0xe5 -> 0xcc 0xf0 : OK, not using the operand anyway, it's a
> breakpoint!
> 0xcc 0xf0 -> 0xeb 0xf0 : OK, removing breakpoint
>
> *but*, the transition we guarantee that CPU B will *never* see without
> having a mfence executed between the old and the new version is:
>
> 0xeb 0xe5 -> 0xeb 0xf0 <----- buggy.
>
> Hope the explanation helps,

Thanks for explanation.
One thing I'd like to know is why you are using mfence
instead of cpuid (a.k.a. sync_core()). I assume that old
processor doesn't have mfence and is that OK?

Thank you,

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhiramat@xxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Mark Lord: "Re: sata_mv 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040"
Previous message: Paul Mackerras: "Re: [PATCH 1/2] tracing: Check total refcount before releasing bufs in profile_enable failure"
In reply to: Mathieu Desnoyers: "Re: [PATCH 1/4] jump label - make init_kernel_text() global"
Next in thread: Mathieu Desnoyers: "Re: [PATCH 1/4] jump label - make init_kernel_text() global"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]