Re: Part of devices not initialized with mlx4

From: Leon Romanovsky
Date: Sun Dec 18 2022 - 04:53:54 EST


On Thu, Dec 15, 2022 at 10:51:15AM +0100, Petr Pavlu wrote:
> Hello,
>
> We have seen an issue when some of ConnectX-3 devices are not initialized
> when mlx4 drivers are a part of initrd.

<...>

> * Systemd stops running services and then sends SIGTERM to "unmanaged" tasks
> on the system to terminate them too. This includes the modprobe task.
> * Initialization of mlx4_en is interrupted in the middle of its init function.

And why do you think that this systemd behaviour is correct one?

> The module remains inserted but only some eth devices are initialized and
> operational.

<...>

> One idea how to address this issue is to model the mlx4 drivers using an
> auxiliary bus, similar to how the same conversion was already done in mlx5.
> This leaves all module loads to udevd which better integrates with the systemd
> processing and a load of mlx4_en doesn't get interrupted.
>
> My incomplete patches implementing this idea are available at:
> https://github.com/petrpavlu/linux/commits/bsc1187236-wip-v1
>
> The rework turned out to be not exactly straightforward and would need more
> effort.

Right, I didn't see any ROI of converting mlx4 to aux bus.

>
> I realize mlx4 is only used for ConnectX-3 and older hardware. I wonder then
> if this kind of rework would be suitable and something to proceed with, or if
> some simpler idea how to address the described issue would be better and
> preferread.

Will it help if you move mlx4_en to rootfs?

Thanks

>
> Thank you,
> Petr
>