Re: [RFC PATCH 2/6] jump label v3 - x86: Introduce generic jumppatching without stop_machine

From: Paul E. McKenney
Date: Wed Nov 18 2009 - 19:58:46 EST


On Wed, Nov 18, 2009 at 07:28:26PM -0500, Mathieu Desnoyers wrote:
> * Jason Baron (jbaron@xxxxxxxxxx) wrote:
> > Add text_poke_fixup() which takes a fixup address to where a processor
> > jumps if it hits the modifying address while code modifying.
> > text_poke_fixup() does following steps for this purpose.
> >
> > 1. Setup int3 handler for fixup.
> > 2. Put a breakpoint (int3) on the first byte of modifying region,
> > and synchronize code on all CPUs.
> > 3. Modify other bytes of modifying region, and synchronize code on all CPUs.
> > 4. Modify the first byte of modifying region, and synchronize code
> > on all CPUs.
> > 5. Clear int3 handler.
> >
>
> Hi Masami,
>
> I like the approach and the API is clean. I have intersped comments
> below.
>
> Ingo: I raise a question about text_mutex vs on_each_cpu hangs I
> experienced recently in the message below. Might be worth having a look,
> I suspect this might have caused the hangs Paul McKenney had with his
> past TREE RCU callback migration. I think he did take a mutex in the cpu
> hotplug callbacks and might have used IPIs within that same lock.

Hello, Mathieu,

By "mutex", do you mean "mutex_lock()"? If so, I don't do that from
within the CPU hotplug notifiers. I do spin_lock_irqsave().

People have been known to invoke synchronize_rcu() from CPU hotplug
notifiers, however -- and this does block. Is that what you are
getting at?

I do invoke smp_send_reschedule() with irqs disabled, which did arouse my
suspicions of late. But this seems to bypass the smp_call_function()
code that is most definitely illegal to invoke with irqs disabled,
so no smoking gun. All that aside, if invoking smp_send_reschedule()
with irqs disabled is in any way a bad idea, please let me know so I
can rearrange the code appropriately.

RCU is currently running reasonably well with the set of patches I have
submitted. It the kernel is now withstanding a level of punishment that
would have reduced 2.6.28 (for example) to a steaming pile of bits, with
or without the help of rcutorture. And I am now hitting the occasional
non-RCU bug, so I am starting to feel like RCU is approaching stability.
Approaching, but not there yet -- a few suspicious areas remain. Time
to crank the testing up another notch or two. ;-)

Thanx, Paul

> > Thus, if some other processor execute modifying address when step2 to step4,
> > it will be jumped to fixup code.
> >
> > This still has many limitations for modifying multi-instructions at once.
> > However, it is enough for 'a 5 bytes nop replacing with a jump' patching,
> > because;
> > - Replaced instruction is just one instruction, which is executed atomically.
> > - Replacing instruction is a jump, so we can set fixup address where the jump
> > goes to.
> >
> > Signed-off-by: Masami Hiramatsu <mhiramat@xxxxxxxxxx>
> > ---
> > arch/x86/include/asm/alternative.h | 12 ++++
> > arch/x86/include/asm/kprobes.h | 1 +
> > arch/x86/kernel/alternative.c | 120 ++++++++++++++++++++++++++++++++++++
> > kernel/kprobes.c | 2 +-
> > 4 files changed, 134 insertions(+), 1 deletions(-)
> >
>
> [snip snip]
>
> > index de7353c..af47f12 100644
> > --- a/arch/x86/kernel/alternative.c
> > +++ b/arch/x86/kernel/alternative.c
> > @@ -4,6 +4,7 @@
> > #include <linux/list.h>
> > #include <linux/stringify.h>
> > #include <linux/kprobes.h>
> > +#include <linux/kdebug.h>
> > #include <linux/mm.h>
> > #include <linux/vmalloc.h>
> > #include <linux/memory.h>
> > @@ -552,3 +553,122 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
> > local_irq_restore(flags);
> > return addr;
> > }
> > +
> > +/*
> > + * On pentium series, Unsynchronized cross-modifying code
> > + * operations can cause unexpected instruction execution results.
> > + * So after code modified, we should synchronize it on each processor.
> > + */
> > +static void __kprobes __local_sync_core(void *info)
> > +{
> > + sync_core();
> > +}
> > +
> > +void __kprobes sync_core_all(void)
> > +{
> > + on_each_cpu(__local_sync_core, NULL, 1);
>
> OK, so you rely on the fact that on_each_cpu has memory barriers and
> executes the remote "__local_sync_core" with the appropriate memory
> barriers underneath, am I correct ?
>
> > +}
> > +
> > +/* Safely cross-code modifying with fixup address */
> > +static void *patch_fixup_from;
> > +static void *patch_fixup_addr;
> > +
> > +static int __kprobes patch_exceptions_notify(struct notifier_block *self,
> > + unsigned long val, void *data)
> > +{
> > + struct die_args *args = data;
> > + struct pt_regs *regs = args->regs;
> > +
> > + if (likely(!patch_fixup_from))
> > + return NOTIFY_DONE;
> > +
> > + if (val != DIE_INT3 || !regs || user_mode_vm(regs) ||
> > + (unsigned long)patch_fixup_from != regs->ip)
> > + return NOTIFY_DONE;
> > +
> > + args->regs->ip = (unsigned long)patch_fixup_addr;
> > + return NOTIFY_STOP;
> > +}
> > +
> > +/**
> > + * text_poke_fixup() -- cross-modifying kernel text with fixup address.
> > + * @addr: Modifying address.
> > + * @opcode: New instruction.
> > + * @len: length of modifying bytes.
> > + * @fixup: Fixup address.
> > + *
> > + * Note: You must backup replaced instructions before calling this,
> > + * if you need to recover it.
> > + * Note: Must be called under text_mutex.
> > + */
> > +void *__kprobes text_poke_fixup(void *addr, const void *opcode, size_t len,
> > + void *fixup)
> > +{
> > + static const unsigned char int3_insn = BREAKPOINT_INSTRUCTION;
> > + static const int int3_size = sizeof(int3_insn);
> > +
> > + /* Replacing 1 byte can be done atomically. */
>
> I'm sure it can be done atomically, but can it be done safely though ?
>
> (disclaimer: we're still waiting for the official answer on this
> statement): Assuming instruction trace cache effects of SMP cross-code
> modification, and that only int3 would be safe to patch, then even
> changing 1 single byte could only be done by going to an intermediate
> int3 and synchronizing all other cores.
>
> > + if (unlikely(len <= 1))
> > + return text_poke(addr, opcode, len);
> > +
> > + /* Preparing */
> > + patch_fixup_addr = fixup;
> > + wmb();
>
> hrm, missing comment ?
>
> > + patch_fixup_from = (u8 *)addr + int3_size; /* IP address after int3 */
> > +
> > + /* Cap by an int3 */
> > + text_poke(addr, &int3_insn, int3_size);
> > + sync_core_all();
> > +
> > + /* Replace tail bytes */
> > + text_poke((char *)addr + int3_size, (const char *)opcode + int3_size,
> > + len - int3_size);
> > + sync_core_all();
> > +
> > + /* Replace int3 with head byte */
> > + text_poke(addr, opcode, int3_size);
> > + sync_core_all();
> > +
> > + /* Cleanup */
> > + patch_fixup_from = NULL;
> > + wmb();
>
> missing comment here too.
>
> > + return addr;
>
> Little quiz question:
>
> When patch_fixup_from is set to NULL, what ensures that the int3
> handlers have completed their execution ?
>
> I think it's probably OK, because the int3 is an interrupt gate, which
> therefore disables interrupts as soon as it runs, and executes the
> notifier while irqs are off. When we run sync_core_all() after replacing
> the int3 by the new 1st byte, we only return when all other cores have
> executed an interrupt, which implies that all int3 handlers previously
> running should have ended. Is it right ? It looks to me as if this 3rd
> sync_core_all() is only needed because of that. Probably that adding a
> comment would be good.
>
>
> Another thing: I've recently noticed that the following locking seems to
> hang the system with doing stress-testing concurrently with cpu
> hotplug/hotunplug:
>
> mutex_lock(&text_mutex);
> on_each_cpu(something, NULL, 1);
>
> The hang seems to be caused by the fact that alternative.c has:
>
> within cpu hotplug (cpu hotplug lock held)
> mutex_lock(&text_mutex);
>
> It might also be caused by the interaction with the stop_machine()
> performed within the cpu hotplug lock. I did not find the root cause of
> the problem, but this probably calls for lockdep improvements.
>
> In any case, given you are dealing with the exact same locking scenario
> here, I would recomment the following stress-test:
>
> in a loop, use text_poke_jump()
> in a loop, hotunplug all cpus, then hotplug all cpus
>
> I had to fix this temporarily by taking get/put_online_cpus() about the
> text_mutex.
>
> [snip snip]
>
> > +static struct notifier_block patch_exceptions_nb = {
> > + .notifier_call = patch_exceptions_notify,
> > + .priority = 0x7fffffff /* we need to be notified first */
> > +};
> > +
> > +static int __init patch_init(void)
> > +{
> > + return register_die_notifier(&patch_exceptions_nb);
> > +}
> > +
> > +arch_initcall(patch_init);
> > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > index e5342a3..43a30d8 100644
> > --- a/kernel/kprobes.c
> > +++ b/kernel/kprobes.c
> > @@ -898,7 +898,7 @@ EXPORT_SYMBOL_GPL(unregister_kprobes);
> >
> > static struct notifier_block kprobe_exceptions_nb = {
> > .notifier_call = kprobe_exceptions_notify,
> > - .priority = 0x7fffffff /* we need to be notified first */
> > + .priority = 0x7ffffff0 /* High priority, but not first. */
>
> It would be good to keep all these priorities in a centralized header.
>
> Thanks,
>
> Mathieu
>
> > };
> >
> > unsigned long __weak arch_deref_entry_point(void *entry)
> > --
> > 1.6.5.1
> >
>
> --
> Mathieu Desnoyers
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/