RE: [PATCH -rt] Preemption problem in kernel RT Patch

From: Beauchemin, Mark
Date: Wed Aug 01 2007 - 10:14:51 EST


Ingo,

I tried just removing the CONFIG_PREEMPT_RT code, but that drops
packets if another task has the lock. Here's the debug printouts:

<4>xmit_lock_owner owned by sigd not softirq-net-rx/
<4>xmit_lock_owner owned by sigd not softirq-net-rx/
<4>xmit_lock_owner owned by sigd not softirq-net-rx/

Our quality department has been testing the patch below for a
few days and has not seen any problems. It pretty much
preserves the original -rt patch pieces, but adds recursive checking.

I changed xmit_lock_owner to a void * as it is now a pointer
to the task which owns the lock. What do you think?

Thanks,
Mark

diff -ur linux-2.6.23-rc1-rt0/include/linux/netdevice.h linux-2.6.23-rc1-rt0_new/include/linux/netdevice.h
--- linux-2.6.23-rc1-rt0/include/linux/netdevice.h 2007-07-24 15:17:07.000000000 -0400
+++ linux-2.6.23-rc1-rt0_new/include/linux/netdevice.h 2007-08-01 09:01:32.000000000 -0400
@@ -468,7 +468,11 @@
/* cpu id of processor entered to hard_start_xmit or -1,
if nobody entered there.
*/
- int xmit_lock_owner;
+#ifdef CONFIG_PREEMPT_RT
+ void * xmit_lock_owner;
+#else
+ int xmit_lock_owner;
+#endif
void *priv; /* pointer to private data */
int (*hard_start_xmit) (struct sk_buff *skb,
struct net_device *dev);
@@ -1041,32 +1045,54 @@
static inline void netif_tx_lock(struct net_device *dev)
{
spin_lock(&dev->_xmit_lock);
- dev->xmit_lock_owner = raw_smp_processor_id();
+#ifdef CONFIG_PREEMPT_RT
+ dev->xmit_lock_owner = (void *)current;
+#else
+ dev->xmit_lock_owner = raw_smp_processor_id();
+#endif
}

static inline void netif_tx_lock_bh(struct net_device *dev)
{
spin_lock_bh(&dev->_xmit_lock);
- dev->xmit_lock_owner = raw_smp_processor_id();
+#ifdef CONFIG_PREEMPT_RT
+ dev->xmit_lock_owner = (void *)current;
+#else
+ dev->xmit_lock_owner = raw_smp_processor_id();
+#endif
}

static inline int netif_tx_trylock(struct net_device *dev)
{
int ok = spin_trylock(&dev->_xmit_lock);
if (likely(ok))
- dev->xmit_lock_owner = raw_smp_processor_id();
+ {
+#ifdef CONFIG_PREEMPT_RT
+ dev->xmit_lock_owner = (void *)current;
+#else
+ dev->xmit_lock_owner = raw_smp_processor_id();
+#endif
+ }
return ok;
}

static inline void netif_tx_unlock(struct net_device *dev)
{
+#ifdef CONFIG_PREEMPT_RT
dev->xmit_lock_owner = -1;
+#else
+ dev->xmit_lock_owner = (void *)-1;
+#endif
spin_unlock(&dev->_xmit_lock);
}

static inline void netif_tx_unlock_bh(struct net_device *dev)
{
+#ifdef CONFIG_PREEMPT_RT
dev->xmit_lock_owner = -1;
+#else
+ dev->xmit_lock_owner = (void *)-1;
+#endif
spin_unlock_bh(&dev->_xmit_lock);
}

diff -ur linux-2.6.23-rc1-rt0/net/core/dev.c linux-2.6.23-rc1-rt0_new/net/core/dev.c
--- linux-2.6.23-rc1-rt0/net/core/dev.c 2007-07-24 15:17:07.000000000 -0400
+++ linux-2.6.23-rc1-rt0_new/net/core/dev.c 2007-08-01 08:56:02.000000000 -0400
@@ -1592,7 +1592,7 @@
* No need to check for recursion with threaded interrupts:
*/
#ifdef CONFIG_PREEMPT_RT
- if (1) {
+ if (dev->xmit_lock_owner != (void *)current) {
#else
int cpu = raw_smp_processor_id(); /* ok because BHs are off */

diff -ur linux-2.6.23-rc1-rt0/net/sched/sch_generic.c linux-2.6.23-rc1-rt0_new/net/sched/sch_generic.c
--- linux-2.6.23-rc1-rt0/net/sched/sch_generic.c 2007-07-24 15:17:07.000000000 -0400
+++ linux-2.6.23-rc1-rt0_new/net/sched/sch_generic.c 2007-08-01 08:57:14.000000000 -0400
@@ -153,7 +153,13 @@

if (!lockless) {
#ifdef CONFIG_PREEMPT_RT
- netif_tx_lock(dev);
+ if (dev->xmit_lock_owner == (void *)current) {
+ kfree_skb(skb);
+ if (net_ratelimit())
+ printk(KERN_DEBUG "Dead loop on netdevice %s, fix it urgently!\n", dev->name);
+ return -1;
+ }
+ netif_tx_lock(dev);
#else
if (netif_tx_trylock(dev))
/* Another CPU grabbed the driver tx lock */



-----Original Message-----
From: Ingo Molnar [mailto:mingo@xxxxxxx]
Sent: Tuesday, July 24, 2007 3:15 PM
To: Beauchemin, Mark
Cc: Thomas Gleixner; linux-kernel@xxxxxxxxxxxxxxx; David Miller
Subject: Re: [PATCH -rt] Preemption problem in kernel RT Patch



* Beauchemin, Mark <Mark.Beauchemin@xxxxxxxxxxxxxxx> wrote:

> I'm not sure why the check for recursion has been removed. In the
> backtrace below, I think it would be caught by this check and not
> recursively call the spinlock code.

ah ... i think i did it like that because i didnt realize that there
would be a recursive call sequence, i was concentrating on recursive
locking.

incidentally, this code got cleaned up in .23-rc1-rt0, and now it looks
quite similar to your suggested fix. Could you double-check that it
solves your problem?

Ingo


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/