Re: Part of devices not initialized with mlx4

From: Petr Pavlu
Date: Mon Jan 02 2023 - 05:34:30 EST


On 12/18/22 10:53, Leon Romanovsky wrote:
> On Thu, Dec 15, 2022 at 10:51:15AM +0100, Petr Pavlu wrote:
>> Hello,
>>
>> We have seen an issue when some of ConnectX-3 devices are not initialized
>> when mlx4 drivers are a part of initrd.
>
> <...>
>
>> * Systemd stops running services and then sends SIGTERM to "unmanaged" tasks
>> on the system to terminate them too. This includes the modprobe task.
>> * Initialization of mlx4_en is interrupted in the middle of its init function.
>
> And why do you think that this systemd behaviour is correct one?

My view is that this is an issue between the kernel and initrd/systemd.
Switching the root is a delicate operation and both parts need to carefully
cooperate for it to work correctly.

I think it is generally sensible that systemd tries to terminate any remaining
processes started from the initrd. They would have troubles when the root is
switched under their hands anyway, unless they are specifically prepared for
it. Systemd only skips terminating kthreads and allows to exclude root storage
daemons. A modprobe helper could be excluded from being terminated too but the
problem with the root switch remains.

It looks to me that a good approach is to complete all running module loads
before switching the root and continue with any further loads after the
operation is done. Leaving module loads to udevd assures this, hence the idea
to use an auxiliary bus.

>> The module remains inserted but only some eth devices are initialized and
>> operational.
>
> <...>
>
>> One idea how to address this issue is to model the mlx4 drivers using an
>> auxiliary bus, similar to how the same conversion was already done in mlx5.
>> This leaves all module loads to udevd which better integrates with the systemd
>> processing and a load of mlx4_en doesn't get interrupted.
>>
>> My incomplete patches implementing this idea are available at:
>> https://github.com/petrpavlu/linux/commits/bsc1187236-wip-v1
>>
>> The rework turned out to be not exactly straightforward and would need more
>> effort.
>
> Right, I didn't see any ROI of converting mlx4 to aux bus.

I see, but in case you and other maintainers are not immediately opposed to
this conversion idea, I could try to resolve remaining problems in my port and
see how it turns out?

>> I realize mlx4 is only used for ConnectX-3 and older hardware. I wonder then
>> if this kind of rework would be suitable and something to proceed with, or if
>> some simpler idea how to address the described issue would be better and
>> preferread.
>
> Will it help if you move mlx4_en to rootfs?

Yes, if mlx4 drivers are not in the initrd but only on the rootfs then this
issue is not present. A problem is that VM image templates have their initrd
typically generated as no-hostonly and so include all drivers. Some images
might also require that networking is already available in the initrd for
instance initialization and so must include these drivers.

Thanks,
Petr