Re: [PATCH V3 2/5] misc: mlx5ctl: Add mlx5ctl misc driver

From: Saeed Mahameed
Date: Mon Nov 27 2023 - 14:26:15 EST


On 27 Nov 18:27, Greg Kroah-Hartman wrote:
On Mon, Nov 27, 2023 at 12:17:32PM -0400, Jason Gunthorpe wrote:
On Mon, Nov 27, 2023 at 03:51:10PM +0000, Greg Kroah-Hartman wrote:

> Ok, best of luck with this mess, I'll stop harping on it now and just
> point out all of the other issues here. First off, you all need to get
> the network maintainers to agree that this driver is ok to do this way,
> and I don't think that has happened yet, so I'll wait on reviewing the
> series until that is resolved.

As I said already, I strongly disagree with the idea that the netdev
maintainers get a global veto on what happens with mlx5 devices just
because they sometimes have an ethernet port on the back of the card.

I understand you might disagree, however I hold their opinion in high
regard and want to ensure that they agree that exposing device-specific
debugging information for a device that deals with networking is ok to
do so in a device-specific misc device node and not through some other
way that other networking devices normally do (i.e. netlink or
some-other-such-thing.)

Note, device-specific character devices have almost always proven to be
a bad idea in the long run, I understand your immediate need to do
something like this, but remember that keeping it alive for the next 20+
years is going to be tough.


This driver is different as it doesn't replace existing mlx5 drivers,
mlx5 functionality drivers are still there to expose the device features
through the standard stacks, this is just a companion driver to access
debug information, by driver and FW design mlx5ctl is not meant to
manage or pilot the device like other device specific char drivers.

To be clear this debug driver (or at least an older version of it) has been already in use for over than 15 years, since the beginning
of mlx5, we used to only provide it as external package called mft debug tools [1] which has the kernel parts as well. Now it's time to
upstream it.

mlx5ctl will keep serving existing and future HW for the next few decades,
I am pretty sure of that. as the cover-letter explains mlx5 architecture
is set in stone and written in ink, the same mlx5 drivers work on any
ConnectX chip since 2012, and the will keep working on the next generations
of chips, mlx5ctl will be no different.

[1] https://network.nvidia.com/products/adapter-software/firmware-tools/

This module is primarily (but not exclusively) for rdma related
functionality, not netdev, and the RDMA maintainers Ack it.


For Infiniband/virtio/vfio/vdpa/nvme/fpga ConnectX devices mlx5 netdev
doesn't even exist, so it is not reasonable to ask that the debug
interface should go via the netdev stack, mlx5ctl is needed to serve
all users of mlx5 devices, not only netdev (networking).

So I really find this odd, that one stack maintainer gets a veto over all
others.

In my mind, RDMA implies networking, as it's over a network connection,
but hey, I might be wrong :)

thanks,

greg k-h