Re: [RFC PATCH 2/6] jump label v3 - x86: Introduce generic jumppatching without stop_machine

From: Mathieu Desnoyers
Date: Wed Nov 18 2009 - 20:58:23 EST


* Paul E. McKenney (paulmck@xxxxxxxxxxxxxxxxxx) wrote:
> On Wed, Nov 18, 2009 at 07:28:26PM -0500, Mathieu Desnoyers wrote:
> > * Jason Baron (jbaron@xxxxxxxxxx) wrote:
> > > Add text_poke_fixup() which takes a fixup address to where a processor
> > > jumps if it hits the modifying address while code modifying.
> > > text_poke_fixup() does following steps for this purpose.
> > >
> > > 1. Setup int3 handler for fixup.
> > > 2. Put a breakpoint (int3) on the first byte of modifying region,
> > > and synchronize code on all CPUs.
> > > 3. Modify other bytes of modifying region, and synchronize code on all CPUs.
> > > 4. Modify the first byte of modifying region, and synchronize code
> > > on all CPUs.
> > > 5. Clear int3 handler.
> > >
> >
> > Hi Masami,
> >
> > I like the approach and the API is clean. I have intersped comments
> > below.
> >
> > Ingo: I raise a question about text_mutex vs on_each_cpu hangs I
> > experienced recently in the message below. Might be worth having a look,
> > I suspect this might have caused the hangs Paul McKenney had with his
> > past TREE RCU callback migration. I think he did take a mutex in the cpu
> > hotplug callbacks and might have used IPIs within that same lock.
>
> Hello, Mathieu,
>
> By "mutex", do you mean "mutex_lock()"? If so, I don't do that from
> within the CPU hotplug notifiers. I do spin_lock_irqsave().
>
> People have been known to invoke synchronize_rcu() from CPU hotplug
> notifiers, however -- and this does block. Is that what you are
> getting at?

Hi Paul,

What I had in mind is more like a N-way deadlock involving the cpu
hotplug mutex, on_each_cpu, and possibly stop_machine interaction.
However I did not push the lockup analysis far enough to know for sure,
but I feel like it's good to let others know. I am not sure if blocking
primitives could be affected by this.

>
> I do invoke smp_send_reschedule() with irqs disabled, which did arouse my
> suspicions of late. But this seems to bypass the smp_call_function()
> code that is most definitely illegal to invoke with irqs disabled,
> so no smoking gun. All that aside, if invoking smp_send_reschedule()
> with irqs disabled is in any way a bad idea, please let me know so I
> can rearrange the code appropriately.
>
> RCU is currently running reasonably well with the set of patches I have
> submitted. It the kernel is now withstanding a level of punishment that
> would have reduced 2.6.28 (for example) to a steaming pile of bits, with
> or without the help of rcutorture. And I am now hitting the occasional
> non-RCU bug, so I am starting to feel like RCU is approaching stability.
> Approaching, but not there yet -- a few suspicious areas remain. Time
> to crank the testing up another notch or two. ;-)

Nice :) I've been going through the same "stabilization" phase in the
past weeks for LTTng. It's good to see things stabilize even under heavy
cpu hotplug stress-tests.

Thanks,

Mathieu

>
> Thanx, Paul
>
> > > Thus, if some other processor execute modifying address when step2 to step4,
> > > it will be jumped to fixup code.
> > >
> > > This still has many limitations for modifying multi-instructions at once.
> > > However, it is enough for 'a 5 bytes nop replacing with a jump' patching,
> > > because;
> > > - Replaced instruction is just one instruction, which is executed atomically.
> > > - Replacing instruction is a jump, so we can set fixup address where the jump
> > > goes to.
> > >
> > > Signed-off-by: Masami Hiramatsu <mhiramat@xxxxxxxxxx>
> > > ---
> > > arch/x86/include/asm/alternative.h | 12 ++++
> > > arch/x86/include/asm/kprobes.h | 1 +
> > > arch/x86/kernel/alternative.c | 120 ++++++++++++++++++++++++++++++++++++
> > > kernel/kprobes.c | 2 +-
> > > 4 files changed, 134 insertions(+), 1 deletions(-)
> > >
> >
> > [snip snip]
> >
> > > index de7353c..af47f12 100644
> > > --- a/arch/x86/kernel/alternative.c
> > > +++ b/arch/x86/kernel/alternative.c
> > > @@ -4,6 +4,7 @@
> > > #include <linux/list.h>
> > > #include <linux/stringify.h>
> > > #include <linux/kprobes.h>
> > > +#include <linux/kdebug.h>
> > > #include <linux/mm.h>
> > > #include <linux/vmalloc.h>
> > > #include <linux/memory.h>
> > > @@ -552,3 +553,122 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
> > > local_irq_restore(flags);
> > > return addr;
> > > }
> > > +
> > > +/*
> > > + * On pentium series, Unsynchronized cross-modifying code
> > > + * operations can cause unexpected instruction execution results.
> > > + * So after code modified, we should synchronize it on each processor.
> > > + */
> > > +static void __kprobes __local_sync_core(void *info)
> > > +{
> > > + sync_core();
> > > +}
> > > +
> > > +void __kprobes sync_core_all(void)
> > > +{
> > > + on_each_cpu(__local_sync_core, NULL, 1);
> >
> > OK, so you rely on the fact that on_each_cpu has memory barriers and
> > executes the remote "__local_sync_core" with the appropriate memory
> > barriers underneath, am I correct ?
> >
> > > +}
> > > +
> > > +/* Safely cross-code modifying with fixup address */
> > > +static void *patch_fixup_from;
> > > +static void *patch_fixup_addr;
> > > +
> > > +static int __kprobes patch_exceptions_notify(struct notifier_block *self,
> > > + unsigned long val, void *data)
> > > +{
> > > + struct die_args *args = data;
> > > + struct pt_regs *regs = args->regs;
> > > +
> > > + if (likely(!patch_fixup_from))
> > > + return NOTIFY_DONE;
> > > +
> > > + if (val != DIE_INT3 || !regs || user_mode_vm(regs) ||
> > > + (unsigned long)patch_fixup_from != regs->ip)
> > > + return NOTIFY_DONE;
> > > +
> > > + args->regs->ip = (unsigned long)patch_fixup_addr;
> > > + return NOTIFY_STOP;
> > > +}
> > > +
> > > +/**
> > > + * text_poke_fixup() -- cross-modifying kernel text with fixup address.
> > > + * @addr: Modifying address.
> > > + * @opcode: New instruction.
> > > + * @len: length of modifying bytes.
> > > + * @fixup: Fixup address.
> > > + *
> > > + * Note: You must backup replaced instructions before calling this,
> > > + * if you need to recover it.
> > > + * Note: Must be called under text_mutex.
> > > + */
> > > +void *__kprobes text_poke_fixup(void *addr, const void *opcode, size_t len,
> > > + void *fixup)
> > > +{
> > > + static const unsigned char int3_insn = BREAKPOINT_INSTRUCTION;
> > > + static const int int3_size = sizeof(int3_insn);
> > > +
> > > + /* Replacing 1 byte can be done atomically. */
> >
> > I'm sure it can be done atomically, but can it be done safely though ?
> >
> > (disclaimer: we're still waiting for the official answer on this
> > statement): Assuming instruction trace cache effects of SMP cross-code
> > modification, and that only int3 would be safe to patch, then even
> > changing 1 single byte could only be done by going to an intermediate
> > int3 and synchronizing all other cores.
> >
> > > + if (unlikely(len <= 1))
> > > + return text_poke(addr, opcode, len);
> > > +
> > > + /* Preparing */
> > > + patch_fixup_addr = fixup;
> > > + wmb();
> >
> > hrm, missing comment ?
> >
> > > + patch_fixup_from = (u8 *)addr + int3_size; /* IP address after int3 */
> > > +
> > > + /* Cap by an int3 */
> > > + text_poke(addr, &int3_insn, int3_size);
> > > + sync_core_all();
> > > +
> > > + /* Replace tail bytes */
> > > + text_poke((char *)addr + int3_size, (const char *)opcode + int3_size,
> > > + len - int3_size);
> > > + sync_core_all();
> > > +
> > > + /* Replace int3 with head byte */
> > > + text_poke(addr, opcode, int3_size);
> > > + sync_core_all();
> > > +
> > > + /* Cleanup */
> > > + patch_fixup_from = NULL;
> > > + wmb();
> >
> > missing comment here too.
> >
> > > + return addr;
> >
> > Little quiz question:
> >
> > When patch_fixup_from is set to NULL, what ensures that the int3
> > handlers have completed their execution ?
> >
> > I think it's probably OK, because the int3 is an interrupt gate, which
> > therefore disables interrupts as soon as it runs, and executes the
> > notifier while irqs are off. When we run sync_core_all() after replacing
> > the int3 by the new 1st byte, we only return when all other cores have
> > executed an interrupt, which implies that all int3 handlers previously
> > running should have ended. Is it right ? It looks to me as if this 3rd
> > sync_core_all() is only needed because of that. Probably that adding a
> > comment would be good.
> >
> >
> > Another thing: I've recently noticed that the following locking seems to
> > hang the system with doing stress-testing concurrently with cpu
> > hotplug/hotunplug:
> >
> > mutex_lock(&text_mutex);
> > on_each_cpu(something, NULL, 1);
> >
> > The hang seems to be caused by the fact that alternative.c has:
> >
> > within cpu hotplug (cpu hotplug lock held)
> > mutex_lock(&text_mutex);
> >
> > It might also be caused by the interaction with the stop_machine()
> > performed within the cpu hotplug lock. I did not find the root cause of
> > the problem, but this probably calls for lockdep improvements.
> >
> > In any case, given you are dealing with the exact same locking scenario
> > here, I would recomment the following stress-test:
> >
> > in a loop, use text_poke_jump()
> > in a loop, hotunplug all cpus, then hotplug all cpus
> >
> > I had to fix this temporarily by taking get/put_online_cpus() about the
> > text_mutex.
> >
> > [snip snip]
> >
> > > +static struct notifier_block patch_exceptions_nb = {
> > > + .notifier_call = patch_exceptions_notify,
> > > + .priority = 0x7fffffff /* we need to be notified first */
> > > +};
> > > +
> > > +static int __init patch_init(void)
> > > +{
> > > + return register_die_notifier(&patch_exceptions_nb);
> > > +}
> > > +
> > > +arch_initcall(patch_init);
> > > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > > index e5342a3..43a30d8 100644
> > > --- a/kernel/kprobes.c
> > > +++ b/kernel/kprobes.c
> > > @@ -898,7 +898,7 @@ EXPORT_SYMBOL_GPL(unregister_kprobes);
> > >
> > > static struct notifier_block kprobe_exceptions_nb = {
> > > .notifier_call = kprobe_exceptions_notify,
> > > - .priority = 0x7fffffff /* we need to be notified first */
> > > + .priority = 0x7ffffff0 /* High priority, but not first. */
> >
> > It would be good to keep all these priorities in a centralized header.
> >
> > Thanks,
> >
> > Mathieu
> >
> > > };
> > >
> > > unsigned long __weak arch_deref_entry_point(void *entry)
> > > --
> > > 1.6.5.1
> > >
> >
> > --
> > Mathieu Desnoyers
> > OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/