[patch] revert: [NET]: Fix races in net_rx_action vs netpoll

From: Ingo Molnar
Date: Mon Jul 16 2007 - 05:12:59 EST



current -git broke my main testbox. No TCP/IP networking to/from the box
and e1000 would time out in xmit:

NETDEV WATCHDOG: eth0: transmit timed out
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
Tx Queue <0>
TDH <95>
TDT <95>
next_to_use <95>
next_to_clean <ea>
buffer_info[next_to_clean]
time_stamp <ffff8f43>
next_to_watch <ea>
jiffies <ffffb5cc>
next_to_watch.status <1>

After a bisection session the bad commit turned out to be:

29578624e354f56143d92510fff33a8b2aaa2c03 is first bad commit
commit 29578624e354f56143d92510fff33a8b2aaa2c03
Author: Olaf Kirch <olaf.kirch@xxxxxxxxxx>
Date: Wed Jul 11 19:32:02 2007 -0700

[NET]: Fix races in net_rx_action vs netpoll.

Keep netpoll/poll_napi from messing with the poll_list.
Only net_rx_action is allowed to manipulate the list.

Signed-off-by: Olaf Kirch <olaf.kirch@xxxxxxxxxx>
Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>

and indeed the testbox uses netconsole (it's a laptop so this is the
only viable remote debugging option). Applying the revert patch below
makes it work again.

Ingo

------------------>
Subject: [patch] revert: [NET]: Fix races in net_rx_action vs netpoll
From: Ingo Molnar <mingo@xxxxxxx>

commit 29578624 causes netconsole failures:

NETDEV WATCHDOG: eth0: transmit timed out
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
Tx Queue <0>
TDH <95>
TDT <95>
next_to_use <95>
next_to_clean <ea>
buffer_info[next_to_clean]
time_stamp <ffff8f43>
next_to_watch <ea>
jiffies <ffffb5cc>
next_to_watch.status <1>

revert it for now, to make my testsystem work.

Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
---
include/linux/netdevice.h | 10 ----------
net/core/netpoll.c | 8 --------
2 files changed, 18 deletions(-)

Index: linux/include/linux/netdevice.h
===================================================================
--- linux.orig/include/linux/netdevice.h
+++ linux/include/linux/netdevice.h
@@ -261,8 +261,6 @@ enum netdev_state_t
__LINK_STATE_LINKWATCH_PENDING,
__LINK_STATE_DORMANT,
__LINK_STATE_QDISC_RUNNING,
- /* Set by the netpoll NAPI code */
- __LINK_STATE_POLL_LIST_FROZEN,
};


@@ -1016,14 +1014,6 @@ static inline void netif_rx_complete(str
{
unsigned long flags;

-#ifdef CONFIG_NETPOLL
- /* Prevent race with netpoll - yes, this is a kludge.
- * But at least it doesn't penalize the non-netpoll
- * code path. */
- if (test_bit(__LINK_STATE_POLL_LIST_FROZEN, &dev->state))
- return;
-#endif
-
local_irq_save(flags);
__netif_rx_complete(dev);
local_irq_restore(flags);
Index: linux/net/core/netpoll.c
===================================================================
--- linux.orig/net/core/netpoll.c
+++ linux/net/core/netpoll.c
@@ -124,13 +124,6 @@ static void poll_napi(struct netpoll *np
if (test_bit(__LINK_STATE_RX_SCHED, &np->dev->state) &&
npinfo->poll_owner != smp_processor_id() &&
spin_trylock(&npinfo->poll_lock)) {
- /* When calling dev->poll from poll_napi, we may end up in
- * netif_rx_complete. However, only the CPU to which the
- * device was queued is allowed to remove it from poll_list.
- * Setting POLL_LIST_FROZEN tells netif_rx_complete
- * to leave the NAPI state alone.
- */
- set_bit(__LINK_STATE_POLL_LIST_FROZEN, &np->dev->state);
npinfo->rx_flags |= NETPOLL_RX_DROP;
atomic_inc(&trapped);

@@ -138,7 +131,6 @@ static void poll_napi(struct netpoll *np

atomic_dec(&trapped);
npinfo->rx_flags &= ~NETPOLL_RX_DROP;
- clear_bit(__LINK_STATE_POLL_LIST_FROZEN, &np->dev->state);
spin_unlock(&npinfo->poll_lock);
}
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/