Re: [PATCH net-next v1 1/1] net: dsa: microchip: ksz9477: export HW stats over stats64 interface

From: Vladimir Oltean
Date: Fri Feb 18 2022 - 05:43:42 EST


On Fri, Feb 18, 2022 at 09:39:56AM +0100, Oleksij Rempel wrote:
> On Thu, Feb 17, 2022 at 05:55:54PM +0200, Vladimir Oltean wrote:
> > On Thu, Feb 17, 2022 at 03:07:26PM +0100, Oleksij Rempel wrote:
> > > +static void ksz9477_get_stats64(struct dsa_switch *ds, int port,
> > > + struct rtnl_link_stats64 *stats)
> > > +{
> > > + struct ksz_device *dev = ds->priv;
> > > + struct ksz9477_stats_raw *raw;
> > > + struct ksz_port_mib *mib;
> > > + int ret;
> > > +
> > > + mib = &dev->ports[port].mib;
> > > + raw = (struct ksz9477_stats_raw *)mib->counters;
> > > +
> > > + mutex_lock(&mib->cnt_mutex);
> >
> > The eternal problem, ndo_get_stats64 runs in atomic context,
> > mutex_lock() sleeps. Please test your patches with
> > CONFIG_DEBUG_ATOMIC_SLEEP=y.
>
> Good point, thx! I reworked the code.
>
> Beside, I get this warning with differnt locking validators:
> [ 153.140000] br0: port 1(lan2) entered blocking state
> [ 153.190000] br0: port 1(lan2) entered disabled state
> [ 153.320000] device lan2 entered promiscuous mode
> [ 153.350000] ------------[ cut here ]------------
> [ 153.350000] WARNING: CPU: 0 PID: 71 at net/core/dev.c:7913 __dev_set_promiscuity+0x10c/0x138
> [ 153.360000] RTNL: assertion failed at net/core/dev.c (7913)
> [ 153.370000] Modules linked in: atmel_aes atmel_tdes atmel_sha
> [ 153.370000] CPU: 0 PID: 71 Comm: kworker/u2:5 Not tainted 5.17.0-rc2-00714-g845f6fa17e48-dirty #33
> [ 153.370000] Hardware name: Atmel SAMA5
> [ 153.370000] Workqueue: dsa_ordered dsa_slave_switchdev_event_work
> [ 153.370000] unwind_backtrace from show_stack+0x18/0x1c
> [ 153.370000] show_stack from dump_stack_lvl+0x58/0x70
> [ 153.370000] dump_stack_lvl from __warn+0xd8/0x228
> [ 153.370000] __warn from warn_slowpath_fmt+0x98/0xc8
> [ 153.370000] warn_slowpath_fmt from __dev_set_promiscuity+0x10c/0x138
> [ 153.370000] __dev_set_promiscuity from __dev_set_rx_mode+0x8c/0x98
> [ 153.370000] __dev_set_rx_mode from dev_uc_add+0x84/0x8c
> [ 153.370000] dev_uc_add from dsa_port_host_fdb_add+0x48/0x80
> [ 153.370000] dsa_port_host_fdb_add from dsa_slave_switchdev_event_work+0x1dc/0x254
> [ 153.370000] dsa_slave_switchdev_event_work from process_one_work+0x2b0/0x7d4
> [ 153.370000] process_one_work from worker_thread+0x4c/0x53c
> [ 153.370000] worker_thread from kthread+0xf8/0x12c
> [ 153.370000] kthread from ret_from_fork+0x14/0x2c
> [ 153.370000] Exception stack(0xc1f13fb0 to 0xc1f13ff8)
> [ 153.370000] 3fa0: 00000000 00000000 00000000 00000000
> [ 153.370000] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [ 153.370000] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [ 153.380000] irq event stamp: 0
> [ 153.390000] hardirqs last enabled at (0): [<00000000>] 0x0
> [ 153.390000] hardirqs last disabled at (0): [<c0124b38>] copy_process+0x7d8/0x194c
> [ 153.400000] softirqs last enabled at (0): [<c0124b38>] copy_process+0x7d8/0x194c
> [ 153.410000] softirqs last disabled at (0): [<00000000>] 0x0
> [ 153.410000] ---[ end trace 0000000000000000 ]---
> [ 153.420000] device eth0 entered promiscuous mode
> [ 153.770000] ksz9477-switch spi1.0 lan2: configuring for phy/gmii link mode
> [ 155.040000] ksz9477-switch spi1.0 lan4: Link is Down
> [ 156.960000] ksz9477-switch spi1.0 lan2: Link is Up - 1Gbps/Full - flow control rx/tx
>
>
> Is it something known?

Thanks for reporting. It isn't something known (at least to me - who
knows whether somebody may have been chuckling while I was gloating that
I removed rtnl_lock() from the DSA switchdev FDB notifiers).

It turns out that dsa_port_host_fdb_add() needs rtnl_lock() around
dev_uc_add(), at least if the master doesn't support IFF_UNICAST_FLT.

It's pretty nasty. If we add an rtnl_lock() there, we'll deadlock from
the code paths that call dsa_flush_workqueue(), which have rtnl_lock()
already held.

The only way I think we can get out of this is if we call dev_uc_add()
only if the master has IFF_UNICAST_FLT. If it doesn't, we should force a
call to dsa_master_set_promiscuity() during dsa_master_setup().
Not exactly the cleanest solution, but at least it shouldn't deadlock
and it should work.

Andrew, Florian, Vivien?