RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

From: Parav Pandit
Date: Thu Mar 07 2019 - 16:02:42 EST




> -----Original Message-----
> From: Kirti Wankhede <kwankhede@xxxxxxxxxx>
> Sent: Thursday, March 7, 2019 2:54 PM
> To: Parav Pandit <parav@xxxxxxxxxxxx>; Jakub Kicinski
> <jakub.kicinski@xxxxxxxxxxxxx>
> Cc: Or Gerlitz <gerlitz.or@xxxxxxxxx>; netdev@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; michal.lkml@xxxxxxxxxxx; davem@xxxxxxxxxxxxx;
> gregkh@xxxxxxxxxxxxxxxxxxx; Jiri Pirko <jiri@xxxxxxxxxxxx>; Alex
> Williamson <alex.williamson@xxxxxxxxxx>
> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
>
>
>
> <snip>
>
> >>>
> >>> Yes. I got my patches to adapt to mdev way. Will be posting RFC v2 soon.
> >>> Will wait for a day to receive more comments/views from Greg and
> others.
> >>>
> >>> As I explained in this cover-letter and discussion, First use case
> >>> is to create and use mdevs in the host (and not in VM).
> >>> Later on, I am sure once we have mdevs available, VM users will
> >>> likely use
> >> it.
> >>>
> >>> So, mlx5_core driver will have two components as starting point.
> >>>
> >>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c
> >>> This is mdev device life cycle driver which will do,
> >>> mdev_register_device()
> >> and implements mlx5_mdev_ops.
> >>>
> >> Ok. I would suggest not use mdev.c file name, may be add device name,
> >> something like mlx_mdev.c or vfio_mlx.c
> >>
> > mlx5/core is coding convention is not following to prefix mlx to its 40+
> files.
> >
> > it uses actual subsystem or functionality name, such as, sriov.c
> > eswitch.c fw.c en_tc.c (en for Ethernet) lag.c so, mdev.c aligns to
> > rest of the 40+ files.
> >
> >
> >>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c
> >>> This is mdev device driver which does mdev_register_driver() and
> >>> probe() creates netdev by heavily reusing existing code of the PF device.
> >>> These drivers will not be placed under drivers/vfio/mdev, because
> >>> this is
> >> not a vfio driver.
> >>> This is fine, right?
> >>>
> >>
> >> I'm not too familiar with netdev, but can you create netdev on open()
> >> call on mlx mdev device? Then you don't have to write mdev device
> driver.
> >>
> > Who invokes open() and release()?
> > I believe it is the qemu would do open(), release, read/write/mmap?
> >
> > Assuming that is the case,
> > I think its incorrect to create netdev in open.
> > Because when we want to map the mdev to VM using above mdev calls, we
> actually wont be creating netdev in host.
> > Instead, some queues etc will be setup as part of these calls.
> >
> > By default this created mdev is bound to vfio_mdev.
> > And once we unbind the device from this driver, we need to bind to mlx5
> driver so that driver can create the netdev etc.
> >
> > Or did I get open() and friends call wrong?
> >
>
> In 'struct mdev_parent_ops' there are create() and remove(). When user
> creates mdev device by writing UUID to create sysfs, vendor driver's
> create() callback gets called. This should be used to allocate/commit
Yes. I am already past that stage.

> resources from parent device and on remove() callback free those resources.
> So there is no need to bind mlx5 driver to that mdev device.
>
If we don't bind mlx5 driver, vfio_mdev driver is bound to it. Such driver won't create netdev.
Again, we do not want to map this mdev to a VM.
We want to consume it in the host where mdev is created.
So I am able to detach this mdev from vfio_mdev driver as usaual using
$ echo mdev_name > ../drivers/vfio_mdev/unbind

Followed by binding it to mlx5_core driver.

Below is sample output before binding it to mlx5_core driver.
When we bind with mlx5_core driver, that driver creates the netdev in host.
If user wants to map this mdev to VM, user won't bind to mlx5_core driver. instead he will bind to vfio driver and that does usual open/release/...


lrwxrwxrwx 1 root root 0 Mar 7 14:24 69ea1551-d054-46e9-974d-8edae8f0aefe -> ../../../devices/pci0000:00/0000:00:02.2/0000:05:00.0/69ea1551-d054-46e9-974d-8edae8f0aefe
[root@sw-mtx-036 net-next]# ls -l /sys/bus/mdev/devices/69ea1551-d054-46e9-974d-8edae8f0aefe/
total 0
lrwxrwxrwx 1 root root 0 Mar 7 14:24 driver -> ../../../../../bus/mdev/drivers/vfio_mdev
lrwxrwxrwx 1 root root 0 Mar 7 14:24 iommu_group -> ../../../../../kernel/iommu_groups/0
lrwxrwxrwx 1 root root 0 Mar 7 14:24 mdev_type -> ../mdev_supported_types/mlx5_core-mgmt
drwxr-xr-x 2 root root 0 Mar 7 14:24 power
--w------- 1 root root 4096 Mar 7 14:24 remove
lrwxrwxrwx 1 root root 0 Mar 7 14:24 subsystem -> ../../../../../bus/mdev
-rw-r--r-- 1 root root 4096 Mar 7 14:24 uevent

> open/release/read/write/mmap/ioctl are regular file operations for that
> mdev device.
>

> Thanks,
> Kirti