Re: [RFC PATCH 2/8] jump label v4 - x86: Introduce generic jumppatching without stop_machine

From: Mathieu Desnoyers
Date: Wed Jan 13 2010 - 09:31:00 EST


* H. Peter Anvin (hpa@xxxxxxxxx) wrote:
> On 01/12/2010 06:06 PM, Mathieu Desnoyers wrote:
> > * H. Peter Anvin (hpa@xxxxxxxxx) wrote:
> >> On 01/12/2010 08:26 AM, Jason Baron wrote:
> >>> Add text_poke_fixup() which takes a fixup address to where a processor
> >>> jumps if it hits the modifying address while code modifying.
> >>> text_poke_fixup() does following steps for this purpose.
> >>>
> >>> 1. Setup int3 handler for fixup.
> >>> 2. Put a breakpoint (int3) on the first byte of modifying region,
> >>> and synchronize code on all CPUs.
> >>> 3. Modify other bytes of modifying region, and synchronize code on all CPUs.
> >>> 4. Modify the first byte of modifying region, and synchronize code
> >>> on all CPUs.
> >>> 5. Clear int3 handler.
> >>>
> >>
> >> We (Intel OTC) have been able to get an *unofficial* answer as to the
> >> validity of this procedure; specifically as it applies to Intel hardware
> >> (obviously). We are working on getting an officially approved answer,
> >> but as far as we currently know, the procedure as outlined above should
> >> work on all Intel hardware. In fact, we believe the synchronization in
> >> step 3 is in fact unnecessary (as the synchronization in step 4 provides
> >> sufficient guard.)
> >
> > Hi Peter,
> >
> > This is great news! Thanks to Intel OTC and yourself for looking into
> > this. In the immediate values patches, I am doing the synchronization at
> > the end of step (3) to ensure that all remote CPUs issue read memory
> > barriers, so the stores to the instruction are done in this order:
> >
> > spin lock
> > store int3 to 1st byte
> > smp_wmb()
> > sync all cores
> > store new instruction in all but 1st byte
> > smp_wmb()
> > issue smp_rmb() on all cores (a sync all cores has this effect)
> > store new instruction to 1st byte
> > send IPI to all cores (or call synchronize_sched()) to wait for all
> > breakpoint handlers to complete.
> > spin unlock
> >
> > So the question is: are these wmb/rmb pairs actually needed ? As the
> > instruction fetch is not performed by instructions per se, I doubt a
> > rmb() will have any effect on them. I always prefer to stay on the safe
> > side, but it wouldn't hurt to know.
> >
>
> I don't think the smp_rmb() has any function.

OK, that's good to know.

>
> However, you're being quite inconsistent in your terminology here. The
> assumption above is that the "synchronize code on all CPU" step is
> sending an IPI to all cores and waiting for it to return, so that each
> core has executed IPI/IRET before continuation.

To be strictly correct, we cannot assume that the IPI handler issues IRET
before signaling its completion. It's rather the other way around.
This is why I add a smp_mb() in the IPI handler for the "synchronize
code on all CPUs" step.

>
> It is *not* necessary to wait for the breakpoint handlers to return, as
> long as they will get to IRET eventually, since IRET is a jump and a
> serializing instruction.

Ah, I see. So the added smp_mb() would not be needed then, as long as we
know that the other CPUs either are currently running the IPI handler or
have executed it. IOW: they will execute IRET very soon or they just
executed it since the int3 have been written. I am a bit concerned about
NMIs coming in this race window, but as they need to have started after
we have put the breakpoint, that should be OK. (note: entry_*.S
modifications are needed to support nesting breakpoint handlers in NMIs)

>
> > Hrm. Assuming we have a spinlock protecting all this, given that we
> > synchronize all cores at step (4) _after_ removing the breakpoint, and
> > given that the breakpoint handler is an interrupt gate (thus executes
> > with interrupts off), I am inclined to think that sending the IPIs at
> > the end of step (4) (and waiting for them to complete) should be enough
> > to ensure that all in-flight breakpoint handlers for this site have
> > completed their execution. This would mean that we only have to keep
> > track of a single site at a time. Or am I missing something ?
>
> Yes: the whole point was that you can omit the synchronization in step 4
> if you leave the breakpoint handler in place (I said "omit step 5", but
> that wasn't really what I meant.)
>
> That means that at the cost of two compares in the standard #BP handler,
> we can get away with only one IPI per atomic instruction poke.

OK. That makes sense now.

Thanks,

Mathieu

>
> -hpa
>
>
>
> --
> H. Peter Anvin, Intel Open Source Technology Center
> I work for Intel. I don't speak on their behalf.
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/