Re: [PATCH-2.2] Bonding Driver Enhancements - final

From: willy tarreau (wtarreau@yahoo.fr)
Date: Fri Oct 06 2000 - 03:32:24 EST


> Donald's email server has been down for few days, my
> machine was not able to send him e-mail.
ok.

> Regarding your last patch -- it does not include the
> documentation update (ifenslave.c compile problem is
> solved).

yes, it's just what I've discovered yesterday evening.
I'll attach the complete bonding.txt here, and rebuild
the patch later.

> BAD RESULTS.
> It did not work with SMP kernel on Compaq (worked

ok, very interesting. IMHO, the only way to get into
this trouble is a problem when setting a link up. I'll
try to investigate, perhaps I'll have access to a
dual-ppro today.

> (the link detection worked fine and was reported
even
> when the network was dead).

so the MII in the interfaces drivers works correctly.

> As soon as packets were queued to be sent via the
> second NIC, the network would effectively die.

I believe the eepro driver can reset itself under
certain conditions (it includes its own timer for
this). I don't know exactly why, but if this happens,
I can imagine the trunk seeing the interface go down.

> saw a lot of messages like "eth1: transmit timeout,
> status..."

this tends to validate the last possibility.

> With both the old and the new patch, trying to bring
> the second ethernet down (either manually with
> ifconfig or at reboot) would freeze the
> machine.

ok, the problem checking for no active interface
came again.

> It was not a lock-up; machine would respond to
> keyboard, but all commands would get stuck.

you should have had only one CPU completely stuck in
the xmit loop, and the other one processing your
keyboard inputs.

> This could be a case of a Compaq that is not
> properly configured for Linux

nope, the problem should come from my code (very few
experience in SMP code, only multi-threading you
see:-)

> GOOD RESULTS.
> Both the old and the optimized patch work fine on UP

ok, that's all I could verify on my home PC.

> The new patch does indeed optimize the load.

So Thomas was right.

> Caveats.
.../...
> used, then I restore
> the link, IT IS NOT REPORTED but I see with ifconfig
> that both links are used again.

very interesting bug. This may explain the activation
problem at start time on SMP.

> change. No network performance loss (not even short
> time) was observed during a switch to a backup
> interface.

your switches are very interesting. My alteon AD3 take
between 1 and 40 seconds to understand that the MAC
address changed its port.

> A worse case is when a link is restored

I've written a little about thus problem in the doc.
It
was in the TODO to add two timers : one for link down,
one for link up. My alteon take about 10 seconds to
reboot.

> If we are talking about features, we should remove
> the limitation of one bonding interface!!!

when compiled as a module, you're not limited anymore.

Thanks very very much for all your tests. I'll try to
investigate on the bugs (SMP and link loss). I'll
think
about adding an "up delay" and "down delay" per slave,
to avoid hurry detection.

will try SMP today and others bug this week-end.

Cheers,

Willy

___________________________________________________________
Do You Yahoo!? -- Pour dialoguer en direct avec vos amis,
Yahoo! Messenger : http://fr.messenger.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Oct 07 2000 - 21:00:18 EST