Re: [PATCH 0/2] net: mvpp2: Survive CPU hotplug events

From: Marcin Wojtas
Date: Wed Feb 16 2022 - 08:33:03 EST


śr., 16 lut 2022 o 14:29 Marc Zyngier <maz@xxxxxxxxxx> napisał(a):
>
> On Wed, 16 Feb 2022 13:19:30 +0000,
> Marcin Wojtas <mw@xxxxxxxxxxxx> wrote:
> >
> > Hi Marc,
> >
> > śr., 16 lut 2022 o 10:08 Marc Zyngier <maz@xxxxxxxxxx> napisał(a):
> > >
> > > I recently realised that playing with CPU hotplug on a system equiped
> > > with a set of MVPP2 devices (Marvell 8040) was fraught with danger and
> > > would result in a rapid lockup or panic.
> > >
> > > As it turns out, the per-CPU nature of the MVPP2 interrupts are
> > > getting in the way. A good solution for this seems to rely on the
> > > kernel's managed interrupt approach, where the core kernel will not
> > > move interrupts around as the CPUs for down, but will simply disable
> > > the corresponding interrupt.
> > >
> > > Converting the driver to this requires a bit of refactoring in the IRQ
> > > subsystem to expose the required primitive, as well as a bit of
> > > surgery in the driver itself.
> > >
> > > Note that although the system now survives such event, the driver
> > > seems to assume that all queues are always active and doesn't inform
> > > the device that a CPU has gone away. Someout who actually understand
> > > this driver should have a look at it.
> > >
> > > Patches on top of 5.17-rc3, lightly tested on a McBin.
> > >
> >
> > Thank you for the patches. Can you, please, share the commands you
> > used? I'd like to test it more.
>
> Offline CPU3:
> # echo 0 > /sys/devices/system/cpu/cpu3/online
>
> Online CPU3:
> # echo 1 > /sys/devices/system/cpu/cpu3/online
>
> Put that in a loop, using different CPUs.
>
> On my HW, turning off CPU0 leads to odd behaviours (I wouldn't be
> surprised if the firmware was broken in that respect, and also the
> fact that the device keeps trying to send stuff to that CPU...).
>

Thanks, I think stressing DUT with traffic during CPU hotplug will be
a good scenario - I'll try that.

Marcin