RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

From: Parav Pandit
Date: Tue Mar 05 2019 - 19:44:52 EST


Hi Greg, Kirti,

> -----Original Message-----
> From: Parav Pandit
> Sent: Tuesday, March 5, 2019 5:45 PM
> To: Parav Pandit <parav@xxxxxxxxxxxx>; Kirti Wankhede
> <kwankhede@xxxxxxxxxx>; Jakub Kicinski <jakub.kicinski@xxxxxxxxxxxxx>
> Cc: Or Gerlitz <gerlitz.or@xxxxxxxxx>; netdev@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; michal.lkml@xxxxxxxxxxx; davem@xxxxxxxxxxxxx;
> gregkh@xxxxxxxxxxxxxxxxxxx; Jiri Pirko <jiri@xxxxxxxxxxxx>
> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension
>
>
>
> > -----Original Message-----
> > From: linux-kernel-owner@xxxxxxxxxxxxxxx <linux-kernel-
> > owner@xxxxxxxxxxxxxxx> On Behalf Of Parav Pandit
> > Sent: Tuesday, March 5, 2019 5:17 PM
> > To: Kirti Wankhede <kwankhede@xxxxxxxxxx>; Jakub Kicinski
> > <jakub.kicinski@xxxxxxxxxxxxx>
> > Cc: Or Gerlitz <gerlitz.or@xxxxxxxxx>; netdev@xxxxxxxxxxxxxxx; linux-
> > kernel@xxxxxxxxxxxxxxx; michal.lkml@xxxxxxxxxxx; davem@xxxxxxxxxxxxx;
> > gregkh@xxxxxxxxxxxxxxxxxxx; Jiri Pirko <jiri@xxxxxxxxxxxx>
> > Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink
> > extension
> >
> > Hi Kirti,
> >
> > > -----Original Message-----
> > > From: Kirti Wankhede <kwankhede@xxxxxxxxxx>
> > > Sent: Tuesday, March 5, 2019 4:40 PM
> > > To: Parav Pandit <parav@xxxxxxxxxxxx>; Jakub Kicinski
> > > <jakub.kicinski@xxxxxxxxxxxxx>
> > > Cc: Or Gerlitz <gerlitz.or@xxxxxxxxx>; netdev@xxxxxxxxxxxxxxx;
> > > linux- kernel@xxxxxxxxxxxxxxx; michal.lkml@xxxxxxxxxxx;
> > > davem@xxxxxxxxxxxxx; gregkh@xxxxxxxxxxxxxxxxxxx; Jiri Pirko
> > > <jiri@xxxxxxxxxxxx>
> > > Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
> > > extension
> > >
> > >
> > >
> > > > I am novice at mdev level too. mdev or vfio mdev.
> > > > Currently by default we bind to same vendor driver, but when it
> > > > was
> > > created as passthrough device, vendor driver won't create netdevice
> > > or rdma device for it.
> > > > And vfio/mdev or whatever mature available driver would bind at
> > > > that
> > > point.
> > > >
> > >
> > > Using mdev framework, if you want to partition a physical device
> > > into multiple logic devices, you can bind those devices to same
> > > vendor driver through vfio-mdev, where as if you want to passthrough
> > > the device bind it to vfio-pci. If I understand correctly, that is
> > > what you are
> > looking for.
> > >
> > >
> > We cannot bind a whole PCI device to vfio-pci, reason is, A given PCI
> > device has existing protocol devices on it such as netdevs and rdma dev.
> > This device is partitioned while those protocol devices exist and
> > mlx5_core, mlx5_ib drivers are loaded on it.
> > And we also need to connect these objects rightly to eswitch exposed
> > by devlink interface (net/core/devlink.c) that supports eswitch
> > binding, health, registers, parameters, ports support.
> > It also supports existing PCI VFs.
> >
> > I donât think we want to replicate all of this again in mdev subsystem [1].
> >
> > [1] https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
> >
> > So devlink interface to migrate users from managing VFs to non_VF sub
> > device is natural progression.
> >
> > However, in future, I believe we would be creating mediated devices on
> > user request, to use mdev modules and map them to VM.
> >
> > Also 'mdev_bus' is created as a class and not as a bus. This limits to
> > not use devlink interface whose handle is bus+device name.
> >
> > So one option is to change mdev from class to bus.
> > devlink will create mdevs on the bus, mdev driver can probe these
> > devices on host system by default.
> > And if told to do passthrough, a different driver exposes them to VM.
> > How feasible is this?
> >
> Wait, I do see a mdev bus and mdevs are created on this bus using
> mdev_device_create().
> So how about we create mdevs on this bus using devlink, instead of sysfs?
> And driver side on host gets the mdev_register_driver()->probe()?
>

Thinking more and reviewing more mdev code, I believe mdev fits
this need a lot better than new subdev bus, mfd, platform device, or devlink subport.
For coming future, to map this sub device (mdev) to VM will also be easier by using mdev bus.

I also believe we can use the sysfs interface for mdev life cycle.
Here when mdev are created it will register as devlink instance and
will be able to query/config parameters before driver probe the device.
(instead of having life cycle via devlink)

Few enhancements would be needed for mdev side.
1. making iommu optional.
2. configuring mdev device parameters during creation time

More once get my hands dirty with mdev in RFCv2.

What do you think?