RE: net_device->queue_lock contention on 32-way box

From: Chen, Kenneth W
Date: Wed May 26 2004 - 16:50:14 EST


>>>> David S. Miller wrote on Wednesday, May 26, 2004 1:55 PM
> The net_tx_action() --> qdisc_run() --> qdisc_restart() code path
> can hold the lock for a long time especially if lots of packets
> have been enqueued before net_tx_action() had a chance to run.
>
> For each enqueued packet, we go all the way into the device driver
> to give the packet to the device. Given that PCI PIO accesses are
> likely in these paths, along with some memory accesses (to setup
> packet descriptors and the like) this could take quite a bit of
> time.
>
> We do temporarily release the dev->queue_lock in between each
> packet while we go into the driver. It could be what you're
> seeing is the latency to get the device's dev->xmit_lock because
> we have to acquire that before we can release the dev->queue_lock
>

That's where I'm having trouble in interpreting the lockmeter
data. qdisc_restart() does a trylock on dev->xmit_lock, if it gets
it, unlock queue_lock right away. If it didn't get it, queue the
packet and return, then qdisc_run terminates and the caller to
qdisc_run will unlock queue_lock. Don't see that as a long operation
at all. Could netif_schedule() take long time to run?


> If you bind the device interrupts to one cpu, do things change?

We always bind network interrupt to one cpu, haven't tried with
non-bound configuration.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/