Re: [PATCH RFC] bonding: rate-limit bonding driver inspect messages

From: Hangbin Liu
Date: Sat Feb 17 2024 - 22:09:40 EST


On Sat, Feb 17, 2024 at 12:39:44PM +0000, Praveen Kannoju wrote:
> > -----Original Message-----
> > From: Hangbin Liu <liuhangbin@xxxxxxxxx>
> > Sent: 16 February 2024 02:33 PM
> > To: Praveen Kannoju <praveen.kannoju@xxxxxxxxxx>
> > Cc: j.vosburgh@xxxxxxxxx; andy@xxxxxxxxxxxxx; davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx; kuba@xxxxxxxxxx;
> > pabeni@xxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Rajesh Sivaramasubramaniom
> > <rajesh.sivaramasubramaniom@xxxxxxxxxx>; Rama Nichanamatlu <rama.nichanamatlu@xxxxxxxxxx>; Manjunath Patil
> > <manjunath.b.patil@xxxxxxxxxx>
> > Subject: Re: [PATCH RFC] bonding: rate-limit bonding driver inspect messages
> >
> > On Thu, Feb 15, 2024 at 10:55:54PM +0530, Praveen Kumar Kannoju wrote:
> > > Rate limit bond driver log messages, to prevent a log flood in a
> > > run-away situation, e.g couldn't get rtnl lock. Message flood leads to
> > > instability of system and loss of other crucial messages.
> >
> > Hi Praveen,
> >
> > The patch looks good to me. But would you please help explain why these
> > slave_info() are chosen under net_ratelimit?
> >
> > Thanks
> > Hangbin
>
> Thank you, Hangbin.
>
> The routine bond_mii_monitor() periodically inspects the slave carrier state in order to detect for state changes, on a state change internally records it and does the state change action.
>
> Parked-to-Parked state changes goes through transient state. As an example for Up to Down, BOND_LINK_UP to BOND_LINK_DOWN, is thru BOND_LINK_FAIL. In order to attain next parked state or transient state bond needs rtnl mutex. If in a situation it cannot get it, a state change wouldn't happen. In order to achieve a state change as quickly as possible bond_mii_monitor() reschedules itself to come around after 1 msec.

I think a large miimon downdelay/updelay setting could reduce this.

> And every single come around reinspects the link and sees a state change compared to its internally recorded, which in reality internal state could be not changed earlier as failed to get rtnl lock, and throws again log indicating it sees a state change. If attaining rtnl mutex take long say hypothetical 5 secs, then bond logs 5000 state change message. 1 message at every 1 msec.

Anyway, setting the rate limit do reduce the message flood. Would you please
summarise the paragraph and add it in commit description when post the formal
patch?

thanks
Hangbin