Re: [PATCH] bonding:avoid repeated display of same link status change

From: Michal Kubecek
Date: Thu Oct 25 2018 - 05:29:35 EST


On Thu, Oct 25, 2018 at 02:21:05AM -0700, Manish Kumar Singh wrote:
> > From: Michal Kubecek [mailto:mkubecek@xxxxxxx]
> > IMHO it does not. AFAICS multiple instances of bond_mii_monitor() cannot
> > run simultaneously for the same bond so that there doesn't seem to be
> > anything to collide with. (And if they could, we would need to test and
> > set the flag atomically in bond_miimon_inspect().)
> >
> Yes, Michal, we are inline with your understanding.
> when the -original- patch was posted to upstream there was no
> -synchronization- nor -racing- addressing code was in read/write of this
> added filed, as we -never- saw need for either.
>
> -only- writer of the added field is bond_mii_monitor.
> -only- reader of the added field is bond_miimon_inspect.
> -this writer & reader -never- can run concurrently.
> -writer invokes the reader.
>
> hence, imo uint_8 rtnl_needed is all what is needed; with bond_mii_monitor doing rtnl_needed = 1; and bond_miimon_inspect doing if rtnl_needed.
>
> here is the gravity of the situation with multiple customers whose names including machine names redacted:
>
> 4353 May 31 02:38:57 hostname kernel: ixgbe 0000:03:00.0: removed PHC on p2p1
> 4354 May 31 02:38:57 hostname kernel: public: link status down for active interface p2p1, disabling it in 100 ms
> 4355 May 31 02:38:57 hostname kernel: public: link status down for active interface p2p1, disabling it in 100 ms
> 4356 May 31 02:38:57 hostname kernel: public: link status definitely down for interface p2p1, disabling it
> 4357 May 31 02:38:57 hostname kernel: public: making interface p2p2 the new active one
> 4358 May 31 02:38:59 hostname kernel: ixgbe 0000:03:00.0: registered PHC device on p2p1
> 4359 May 31 02:39:00 hostname kernel: ixgbe 0000:03:00.0 p2p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
> 4360 May 31 02:39:00 hostname kernel: public: link status up for interface p2p1, enabling it in 200 ms
> 4361 May 31 02:39:00 hostname kernel: public: link status definitely up for interface p2p1, 10000 Mbps full duplex
> 4362 May 31 02:45:37 hostname journal: Missed 217723 kernel messages
> 4363 May 31 02:45:37 hostname kernel: public: link status down for active interface p2p2, disabling it in 100 ms
> ---------------------
> 11000+ APPROX SAME REPEATED MESSAGES in second
> ---------------------
> 15877 May 31 02:45:37 hostname kernel: public: link status down for active interface p2p2, disabling it in 100 ms
> 15878 May 31 02:45:37 hostname kernel: public: link status definitely down for interface p2p2, disabling it
> 15879 May 31 02:45:37 hostname kernel: public: making interface p2p1 the new active one

When I was replying, I didn't know this was a v2 and I haven't seen the
v1 discussion. I have read it since and I think I understand Eric's
point now. The thing is that just adding e.g. u8 is OK as it is now.
However, someone could later add another u8 next to it which would also
be perfectly OK on its own but reads/writes to these two could collide
between each other.

And as pointed out by a colleague, even having atomic_t and u8 flag in
one 64-bit word could be a problem on architectures which cannot do an
atomic read/write from/to a 32-bit word (sparc seems to be one).

Michal Kubecek