Re: Oops: 17 SMP ARM (v3.16-rc2)

From: Russell King - ARM Linux
Date: Wed Aug 06 2014 - 05:50:32 EST


On Tue, Aug 05, 2014 at 01:31:29PM +0000, Mattis Lorentzon wrote:
> We have applied your V2 patch set of 30 patches on top of v3.16-rc2 and are
> currently running some stability tests.
>
> During our first test round we triggered a timeout which caused the fec driver
> to become unresponsive for several minutes. The attached backtrace was
> shown when the hardware was rebooted.

What is on the other end of the link?

> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x270/0x27c()
> NETDEV WATCHDOG: eth0 (fec): transmit queue 0 timed out
...
> fec 2188000.ethernet eth0: TX ring dump
> Nr SC addr len SKB
> 0 0x1c00 0x00000000 66 (null)
...
> 83 0x1c00 0x00000000 66 (null)
> 84 H 0x1c00 0x00000000 66 (null)
> 85 0x9c00 0x2e205000 66 9e384f00
> 86 0x1c00 0x2e204800 66 9e384d80
> 87 0x1c00 0x2e204000 66 9e384180
...
> 376 0x1c00 0x2e252800 66 81cf6180
> 377 0x1c00 0x2e253000 66 81cf6240
> 378 S 0x1c00 0x00000000 66 (null)

So, the software would insert the next packet into slot 378. However,
the slots from 85 to 377 have not been reaped, despite those in 86 to
377 allegedly having been sent. This is because the entry in slot 85
shows that it has yet to be sent.

I've no idea what causes this; it looks like there's something screwed
with the hardware which causes the transmitter to skip an entry in the
ring under certain circumstances. As I've never been able to reproduce
it here, I've not been able to investigate it.

What I would like to do is to stamp each packet in some way with an
identifier marking its ring position, and then monitor the network to
find out whether the packet at slot 85 was actually transmitted - that's
made slightly harder because packets may be dropped at the receiver
when operating in promisc mode. This would then allow us to work out
some likely causes.

Note that after the transmit watchdog, the interface should recover and
start operating normally again - and that should not take "several
minutes."

--
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/