RE: [PATCH RFC] bonding: rate-limit bonding driver inspect messages

From: Praveen Kannoju
Date: Mon Feb 19 2024 - 06:36:09 EST




> -----Original Message-----
> From: Hangbin Liu <liuhangbin@xxxxxxxxx>
> Sent: 18 February 2024 08:39 AM
> To: Praveen Kannoju <praveen.kannoju@xxxxxxxxxx>
> Cc: j.vosburgh@xxxxxxxxx; andy@xxxxxxxxxxxxx; davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx; kuba@xxxxxxxxxx;
> pabeni@xxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Rajesh Sivaramasubramaniom
> <rajesh.sivaramasubramaniom@xxxxxxxxxx>; Rama Nichanamatlu <rama.nichanamatlu@xxxxxxxxxx>; Manjunath Patil
> <manjunath.b.patil@xxxxxxxxxx>
> Subject: Re: [PATCH RFC] bonding: rate-limit bonding driver inspect messages
>
> On Sat, Feb 17, 2024 at 12:39:44PM +0000, Praveen Kannoju wrote:
> > > -----Original Message-----
> > > From: Hangbin Liu <liuhangbin@xxxxxxxxx>
> > > Sent: 16 February 2024 02:33 PM
> > > To: Praveen Kannoju <praveen.kannoju@xxxxxxxxxx>
> > > Cc: j.vosburgh@xxxxxxxxx; andy@xxxxxxxxxxxxx; davem@xxxxxxxxxxxxx;
> > > edumazet@xxxxxxxxxx; kuba@xxxxxxxxxx; pabeni@xxxxxxxxxx;
> > > netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Rajesh
> > > Sivaramasubramaniom <rajesh.sivaramasubramaniom@xxxxxxxxxx>; Rama
> > > Nichanamatlu <rama.nichanamatlu@xxxxxxxxxx>; Manjunath Patil
> > > <manjunath.b.patil@xxxxxxxxxx>
> > > Subject: Re: [PATCH RFC] bonding: rate-limit bonding driver inspect
> > > messages
> > >
> > > On Thu, Feb 15, 2024 at 10:55:54PM +0530, Praveen Kumar Kannoju wrote:
> > > > Rate limit bond driver log messages, to prevent a log flood in a
> > > > run-away situation, e.g couldn't get rtnl lock. Message flood
> > > > leads to instability of system and loss of other crucial messages.
> > >
> > > Hi Praveen,
> > >
> > > The patch looks good to me. But would you please help explain why
> > > these
> > > slave_info() are chosen under net_ratelimit?
> > >
> > > Thanks
> > > Hangbin
> >
> > Thank you, Hangbin.
> >
> > The routine bond_mii_monitor() periodically inspects the slave carrier state in order to detect for state changes, on a state change
> internally records it and does the state change action.
> >
> > Parked-to-Parked state changes goes through transient state. As an example for Up to Down, BOND_LINK_UP to
> BOND_LINK_DOWN, is thru BOND_LINK_FAIL. In order to attain next parked state or transient state bond needs rtnl mutex. If in a
> situation it cannot get it, a state change wouldn't happen. In order to achieve a state change as quickly as possible
> bond_mii_monitor() reschedules itself to come around after 1 msec.
>
> I think a large miimon downdelay/updelay setting could reduce this.
>
> > And every single come around reinspects the link and sees a state change compared to its internally recorded, which in reality
> internal state could be not changed earlier as failed to get rtnl lock, and throws again log indicating it sees a state change. If attaining
> rtnl mutex take long say hypothetical 5 secs, then bond logs 5000 state change message. 1 message at every 1 msec.
>
> Anyway, setting the rate limit do reduce the message flood. Would you please summarise the paragraph and add it in commit
> description when post the formal patch?
>
> thanks
> Hangbin

Thank you very much, Hangbin.

I've added the summary on why we intend to rate-limit the messages in the commit, and re-sent the formal patch.

-
Praveen.