Re: [GIT]: Networking

From: Ingo Molnar
Date: Sat Jun 14 2008 - 04:18:16 EST



* David Miller <davem@xxxxxxxxxxxxx> wrote:

> > just to clarify the bug pattern: the box was still accessible after
> > the warning. So this is a far less serious problem and i'd suggest
> > we open up a separate regression entry for it and consider the
> > hung-TCP problem closed. (i havent seen the hang in the last week,
> > with either version of the tcp-accept reverts)
>
> It is a warning that just means the transmitted on the network device
> stalled for an unusually long period of time. Is your subnet flooded
> when these warnings occur? Is the remove side system wedged or at a
> very high load when the message triggers?

yes, both the network and the testbox is at relatively high load, it's a
distcc kernel build over the network. Thousands of such iterations were
done successfully without this warning ever triggering - it triggered
for the first time in about 10,000 bootups the moment i applied your
version of the reverts. When i applied the small diff the warning did
not come back.

> All of these would be useful points of information to determine if
> this might be normal or not.
>
> In theory, if the remove port the device is connected to gets
> extremely congested, emits a pause frame to your machine, but never
> releases that pause, this (new) warning could trigger.
>
> This warning was added by Arjan in 2.6.25 FYI in order to diagnose the
> not-normal cases better.

ok, should we then remove that warning, if it's spurious? kerneloops.org
has picked up a few other instances of this warning as well:

http://www.kerneloops.org/searchfile.php?search=net%2Fsched%2Fsch_generic.c&btnG=Filename+Search

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/