Re: [PATCH 8/9] Kprobes: Use RCU for (un)register synchronization - base changes

From: Paul E. McKenney
Date: Tue Oct 18 2005 - 11:31:15 EST


On Tue, Oct 18, 2005 at 10:43:23AM -0400, Ananth N Mavinakayanahalli wrote:
> On Mon, Oct 17, 2005 at 10:44:36PM -0700, Paul E. McKenney wrote:
> > On Mon, Oct 10, 2005 at 10:47:20AM -0400, Ananth N Mavinakayanahalli wrote:
> > > From: Ananth N Mavinakayanahalli <ananth@xxxxxxxxxx>
> > >
> > > Changes to the base kprobes infrastructure to use RCU for synchronization
> > > during kprobe registration and unregistration. These changes coupled with
> > > the arch kprobe changes (next in series):
> > >
> > > a. serialize registration and unregistration of kprobes.
> > > b. enable lockless execution of handlers. Handlers can now run in parallel.
> >
> > A couple of questions interspersed... Probably my limited knowledge of
> > kprobes, but I have to ask... Search for empty lines to find them.
>
> Paul,
>
> Thanks for the review... replies inlined...
>
> Ananth
>
> >
> > Thanx, Paul
> >
> > > Signed-off-by: Ananth N Mavinakayanahalli <ananth@xxxxxxxxxx>
> > > Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@xxxxxxxxx>
> > > ---
> > >
> > > include/linux/kprobes.h | 9 +---
> > > kernel/kprobes.c | 101 +++++++++++++++++++-----------------------------
> > > 2 files changed, 45 insertions(+), 65 deletions(-)
> > >
> > > Index: linux-2.6.14-rc3/include/linux/kprobes.h
> > > ===================================================================
> > > --- linux-2.6.14-rc3.orig/include/linux/kprobes.h 2005-10-07 21:40:43.000000000 -0400
> > > +++ linux-2.6.14-rc3/include/linux/kprobes.h 2005-10-07 21:40:49.000000000 -0400
> > > @@ -34,6 +34,8 @@
> > > #include <linux/notifier.h>
> > > #include <linux/smp.h>
> > > #include <linux/percpu.h>
> > > +#include <linux/spinlock.h>
> > > +#include <linux/rcupdate.h>
> > >
> > > #include <asm/kprobes.h>
> > >
> > > @@ -146,10 +148,7 @@ struct kretprobe_instance {
> > > };
> > >
> > > #ifdef CONFIG_KPROBES
> > > -/* Locks kprobe: irq must be disabled */
> > > -void lock_kprobes(void);
> > > -void unlock_kprobes(void);
> > > -
> > > +extern spinlock_t kretprobe_lock;
> > > extern int arch_prepare_kprobe(struct kprobe *p);
> > > extern void arch_copy_kprobe(struct kprobe *p);
> > > extern void arch_arm_kprobe(struct kprobe *p);
> > > @@ -160,7 +159,7 @@ extern void show_registers(struct pt_reg
> > > extern kprobe_opcode_t *get_insn_slot(void);
> > > extern void free_insn_slot(kprobe_opcode_t *slot);
> > >
> > > -/* Get the kprobe at this addr (if any). Must have called lock_kprobes */
> > > +/* Get the kprobe at this addr (if any) - called under a rcu_read_lock() */
> >
> > Since the update side uses synchronize_kernel() (in patch 9/9, right?),
>
> Yes, synchronize_sched() to be precise.

Good, am not going blind yet. ;-)

> > shouldn't the above "rcu_read_lock()" instead by preempt_disable()?
>
> My reasoning was that since rcu_read_(un)lock is #defined to
> preempt_(en)disable. But, I agree, since we use synchronize_sched() this
> can just be preempt_disable()...
> Suggestions?

In some realtime-friendly RCU implementations, the RCU read-side critical
sections do not disable preemption. So, if you are using synchronize_sched()
on the update side, you need to use preempt_disable() (or some other
primitive that disables preemption, for example, any of the primitives
that disables hardware interrupts) on the read side.

>From Documentation/RCU/whatisRCU.txt, the correspondence between the
update-side primitive and the read-side primitives needs to be as follows:

Defer Protect

a. synchronize_rcu() rcu_read_lock() / rcu_read_unlock()
call_rcu()

b. call_rcu_bh() rcu_read_lock_bh() / rcu_read_unlock_bh()

c. synchronize_sched() preempt_disable() / preempt_enable()
local_irq_save() / local_irq_restore()
hardirq enter / hardirq exit
NMI enter / NMI exit

> > Also, some of the code in the earlier patches seems to do preempt_disable.
> > For example, kprobe_handler() in arch/i386/kernel/kprobes.c.
>
> Well, the convolution is due to the way kprobes work:
> - Breakpoint hit; execute pre_handler
> - single-step on a copy of the original instruction; execute the
> post_handler
>
> We don't want to be preempted from the time of the breakpoint exception
> until after the post_handler is run. I've taken care to ensure the
> preempt_ calls from the various codepaths are balanced.

OK, didn't follow this -- thank you for the explanation! This sequence
of events happens on each execution of the kprobe handler, or is there
some optimization for subsequent invocations of the same kprobe handler?

If the former, then perhaps there is some way to make the preempt_disable()
that protects the single stepping also protect the probe lookup.

> > In realtime kernels, synchronize_sched() does -not- guarantee that all
> > rcu_read_lock() critical sections have finished, only that any
> > preempt_disable() code sequences (explicit or implicit) have completed.
>
> Are we assured that the preempt_ depth is also taken care of? If that is
> the case, I think we are safe, since the matching preempt_enable() calls
> are _after_ the post_handler is executed.

Seems so to me -- though if I understand you correctly, you should then
be able to simply remove the rcu_read_lock() and rcu_read_unlock() calls,
since that code sequence would be covered by the preempt_disable().

Or am I missing something?

> > > struct kprobe *get_kprobe(void *addr);
> > > struct hlist_head * kretprobe_inst_table_head(struct task_struct *tsk);
> > >
> > > Index: linux-2.6.14-rc3/kernel/kprobes.c
> > > ===================================================================
> > > --- linux-2.6.14-rc3.orig/kernel/kprobes.c 2005-10-07 21:40:43.000000000 -0400
> > > +++ linux-2.6.14-rc3/kernel/kprobes.c 2005-10-07 21:41:11.000000000 -0400
> > > @@ -32,7 +32,6 @@
> > > * <prasanna@xxxxxxxxxx> added function-return probes.
> > > */
> > > #include <linux/kprobes.h>
> > > -#include <linux/spinlock.h>
> > > #include <linux/hash.h>
> > > #include <linux/init.h>
> > > #include <linux/module.h>
> > > @@ -48,8 +47,8 @@
> > > static struct hlist_head kprobe_table[KPROBE_TABLE_SIZE];
> > > static struct hlist_head kretprobe_inst_table[KPROBE_TABLE_SIZE];
> > >
> > > -unsigned int kprobe_cpu = NR_CPUS;
> > > -static DEFINE_SPINLOCK(kprobe_lock);
> > > +static DEFINE_SPINLOCK(kprobe_lock); /* Protects kprobe_table */
> > > +DEFINE_SPINLOCK(kretprobe_lock); /* Protects kretprobe_inst_table */
> > > static DEFINE_PER_CPU(struct kprobe *, kprobe_instance) = NULL;
> > >
> > > /*
> > > @@ -152,41 +151,6 @@ void __kprobes free_insn_slot(kprobe_opc
> > > }
> > > }
> > >
> > > -/* Locks kprobe: irqs must be disabled */
> > > -void __kprobes lock_kprobes(void)
> > > -{
> > > - unsigned long flags = 0;
> > > -
> > > - /* Avoiding local interrupts to happen right after we take the kprobe_lock
> > > - * and before we get a chance to update kprobe_cpu, this to prevent
> > > - * deadlock when we have a kprobe on ISR routine and a kprobe on task
> > > - * routine
> > > - */
> > > - local_irq_save(flags);
> > > -
> > > - spin_lock(&kprobe_lock);
> > > - kprobe_cpu = smp_processor_id();
> > > -
> > > - local_irq_restore(flags);
> > > -}
> > > -
> > > -void __kprobes unlock_kprobes(void)
> > > -{
> > > - unsigned long flags = 0;
> > > -
> > > - /* Avoiding local interrupts to happen right after we update
> > > - * kprobe_cpu and before we get a a chance to release kprobe_lock,
> > > - * this to prevent deadlock when we have a kprobe on ISR routine and
> > > - * a kprobe on task routine
> > > - */
> > > - local_irq_save(flags);
> > > -
> > > - kprobe_cpu = NR_CPUS;
> > > - spin_unlock(&kprobe_lock);
> > > -
> > > - local_irq_restore(flags);
> > > -}
> > > -
> > > /* We have preemption disabled.. so it is safe to use __ versions */
> > > static inline void set_kprobe_instance(struct kprobe *kp)
> > > {
> > > @@ -198,14 +162,19 @@ static inline void reset_kprobe_instance
> > > __get_cpu_var(kprobe_instance) = NULL;
> > > }
> > >
> > > -/* You have to be holding the kprobe_lock */
> > > +/*
> > > + * This routine is called either:
> > > + * - under the kprobe_lock spinlock - during kprobe_[un]register()
> > > + * OR
> > > + * - under an rcu_read_lock() - from arch/xxx/kernel/kprobes.c
> >
> > Same here -- shouldn't the rcu_read_lock() be preempt_disable()?
>
> This is the same routine referred to in the comment above.

OK, same responses apply. ;-)

> > > + */
> > > struct kprobe __kprobes *get_kprobe(void *addr)
> > > {
> > > struct hlist_head *head;
> > > struct hlist_node *node;
> > >
> > > head = &kprobe_table[hash_ptr(addr, KPROBE_HASH_BITS)];
> > > - hlist_for_each(node, head) {
> > > + hlist_for_each_rcu(node, head) {
> > > struct kprobe *p = hlist_entry(node, struct kprobe, hlist);
> > > if (p->addr == addr)
> > > return p;
> > > @@ -221,7 +190,7 @@ static int __kprobes aggr_pre_handler(st
> > > {
> > > struct kprobe *kp;
> > >
> > > - list_for_each_entry(kp, &p->list, list) {
> > > + list_for_each_entry_rcu(kp, &p->list, list) {
> > > if (kp->pre_handler) {
> > > set_kprobe_instance(kp);
> > > if (kp->pre_handler(kp, regs))
> > > @@ -237,7 +206,7 @@ static void __kprobes aggr_post_handler(
> > > {
> > > struct kprobe *kp;
> > >
> > > - list_for_each_entry(kp, &p->list, list) {
> > > + list_for_each_entry_rcu(kp, &p->list, list) {
> > > if (kp->post_handler) {
> > > set_kprobe_instance(kp);
> > > kp->post_handler(kp, regs, flags);
> > > @@ -276,6 +245,7 @@ static int __kprobes aggr_break_handler(
> > > return ret;
> > > }
> > >
> > > +/* Called with kretprobe_lock held */
> > > struct kretprobe_instance __kprobes *get_free_rp_inst(struct kretprobe *rp)
> > > {
> > > struct hlist_node *node;
> > > @@ -285,6 +255,7 @@ struct kretprobe_instance __kprobes *get
> > > return NULL;
> > > }
> > >
> > > +/* Called with kretprobe_lock held */
> > > static struct kretprobe_instance __kprobes *get_used_rp_inst(struct kretprobe
> > > *rp)
> > > {
> > > @@ -295,6 +266,7 @@ static struct kretprobe_instance __kprob
> > > return NULL;
> > > }
> > >
> > > +/* Called with kretprobe_lock held */
> > > void __kprobes add_rp_inst(struct kretprobe_instance *ri)
> > > {
> > > /*
> > > @@ -313,6 +285,7 @@ void __kprobes add_rp_inst(struct kretpr
> > > hlist_add_head(&ri->uflist, &ri->rp->used_instances);
> > > }
> > >
> > > +/* Called with kretprobe_lock held */
> > > void __kprobes recycle_rp_inst(struct kretprobe_instance *ri)
> > > {
> > > /* remove rp inst off the rprobe_inst_table */
> > > @@ -346,13 +319,13 @@ void __kprobes kprobe_flush_task(struct
> > > struct hlist_node *node, *tmp;
> > > unsigned long flags = 0;
> > >
> > > - spin_lock_irqsave(&kprobe_lock, flags);
> > > + spin_lock_irqsave(&kretprobe_lock, flags);
> > > head = kretprobe_inst_table_head(current);
> > > hlist_for_each_entry_safe(ri, node, tmp, head, hlist) {
> > > if (ri->task == tk)
> > > recycle_rp_inst(ri);
> > > }
> > > - spin_unlock_irqrestore(&kprobe_lock, flags);
> > > + spin_unlock_irqrestore(&kretprobe_lock, flags);
> > > }
> > >
> > > /*
> > > @@ -363,9 +336,12 @@ static int __kprobes pre_handler_kretpro
> > > struct pt_regs *regs)
> > > {
> > > struct kretprobe *rp = container_of(p, struct kretprobe, kp);
> > > + unsigned long flags = 0;
> > >
> > > /*TODO: consider to only swap the RA after the last pre_handler fired */
> > > + spin_lock_irqsave(&kretprobe_lock, flags);
> > > arch_prepare_kretprobe(rp, regs);
> > > + spin_unlock_irqrestore(&kretprobe_lock, flags);
> > > return 0;
> > > }
> > >
> > > @@ -396,13 +372,13 @@ static int __kprobes add_new_kprobe(stru
> > > struct kprobe *kp;
> > >
> > > if (p->break_handler) {
> > > - list_for_each_entry(kp, &old_p->list, list) {
> > > + list_for_each_entry_rcu(kp, &old_p->list, list) {
> > > if (kp->break_handler)
> > > return -EEXIST;
> > > }
> > > - list_add_tail(&p->list, &old_p->list);
> > > + list_add_tail_rcu(&p->list, &old_p->list);
> > > } else
> > > - list_add(&p->list, &old_p->list);
> > > + list_add_rcu(&p->list, &old_p->list);
> > > return 0;
> > > }
> > >
> > > @@ -420,18 +396,18 @@ static inline void add_aggr_kprobe(struc
> > > ap->break_handler = aggr_break_handler;
> > >
> > > INIT_LIST_HEAD(&ap->list);
> > > - list_add(&p->list, &ap->list);
> > > + list_add_rcu(&p->list, &ap->list);
> > >
> > > INIT_HLIST_NODE(&ap->hlist);
> > > - hlist_del(&p->hlist);
> > > - hlist_add_head(&ap->hlist,
> > > + hlist_del_rcu(&p->hlist);
> > > + hlist_add_head_rcu(&ap->hlist,
> > > &kprobe_table[hash_ptr(ap->addr, KPROBE_HASH_BITS)]);
> > > }
> > >
> > > /*
> > > * This is the second or subsequent kprobe at the address - handle
> > > * the intricacies
> > > - * TODO: Move kcalloc outside the spinlock
> > > + * TODO: Move kcalloc outside the spin_lock
> > > */
> > > static int __kprobes register_aggr_kprobe(struct kprobe *old_p,
> > > struct kprobe *p)
> > > @@ -457,7 +433,7 @@ static int __kprobes register_aggr_kprob
> > > static inline void cleanup_kprobe(struct kprobe *p, unsigned long flags)
> > > {
> > > arch_disarm_kprobe(p);
> > > - hlist_del(&p->hlist);
> > > + hlist_del_rcu(&p->hlist);
> > > spin_unlock_irqrestore(&kprobe_lock, flags);
> > > arch_remove_kprobe(p);
> > > }
> > > @@ -465,11 +441,10 @@ static inline void cleanup_kprobe(struct
> > > static inline void cleanup_aggr_kprobe(struct kprobe *old_p,
> > > struct kprobe *p, unsigned long flags)
> > > {
> > > - list_del(&p->list);
> > > - if (list_empty(&old_p->list)) {
> > > + list_del_rcu(&p->list);
> > > + if (list_empty(&old_p->list))
> > > cleanup_kprobe(old_p, flags);
> > > - kfree(old_p);
> > > - } else
> > > + else
> > > spin_unlock_irqrestore(&kprobe_lock, flags);
> > > }
> > >
> > > @@ -492,9 +467,9 @@ int __kprobes register_kprobe(struct kpr
> > > if ((ret = arch_prepare_kprobe(p)) != 0)
> > > goto rm_kprobe;
> > >
> > > + p->nmissed = 0;
> > > spin_lock_irqsave(&kprobe_lock, flags);
> > > old_p = get_kprobe(p->addr);
> > > - p->nmissed = 0;
> > > if (old_p) {
> > > ret = register_aggr_kprobe(old_p, p);
> > > goto out;
> > > @@ -502,7 +477,7 @@ int __kprobes register_kprobe(struct kpr
> > >
> > > arch_copy_kprobe(p);
> > > INIT_HLIST_NODE(&p->hlist);
> > > - hlist_add_head(&p->hlist,
> > > + hlist_add_head_rcu(&p->hlist,
> > > &kprobe_table[hash_ptr(p->addr, KPROBE_HASH_BITS)]);
> > >
> > > arch_arm_kprobe(p);
> > > @@ -523,10 +498,16 @@ void __kprobes unregister_kprobe(struct
> > > spin_lock_irqsave(&kprobe_lock, flags);
> > > old_p = get_kprobe(p->addr);
> > > if (old_p) {
> > > + /* cleanup_*_kprobe() does the spin_unlock_irqrestore */
> > > if (old_p->pre_handler == aggr_pre_handler)
> > > cleanup_aggr_kprobe(old_p, p, flags);
> > > else
> > > cleanup_kprobe(p, flags);
> > > +
> > > + synchronize_sched();
> > > + if (old_p->pre_handler == aggr_pre_handler &&
> > > + list_empty(&old_p->list))
> > > + kfree(old_p);
> > > } else
> > > spin_unlock_irqrestore(&kprobe_lock, flags);
> > > }
> > > @@ -603,13 +584,13 @@ void __kprobes unregister_kretprobe(stru
> > >
> > > unregister_kprobe(&rp->kp);
> > > /* No race here */
> > > - spin_lock_irqsave(&kprobe_lock, flags);
> > > + spin_lock_irqsave(&kretprobe_lock, flags);
> > > free_rp_inst(rp);
> > > while ((ri = get_used_rp_inst(rp)) != NULL) {
> > > ri->rp = NULL;
> > > hlist_del(&ri->uflist);
> > > }
> > > - spin_unlock_irqrestore(&kprobe_lock, flags);
> > > + spin_unlock_irqrestore(&kretprobe_lock, flags);
> > > }
> > >
> > > static int __init init_kprobes(void)
> > > -
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at http://www.tux.org/lkml/
> > >
> > >
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/