Re: [PATCH V8 04/10] cxl/pci: Create auxiliary devices for each DOE mailbox

From: Dan Williams
Date: Fri Apr 29 2022 - 13:01:36 EST


On Fri, Apr 29, 2022 at 9:39 AM Jonathan Cameron
<Jonathan.Cameron@xxxxxxxxxx> wrote:
>
> On Thu, 28 Apr 2022 14:09:38 -0700
> ira.weiny@xxxxxxxxx wrote:
>
> > On Wed, Apr 27, 2022 at 06:19:42PM +0100, Jonathan Cameron wrote:
> > > On Thu, 14 Apr 2022 13:32:31 -0700
> > > ira.weiny@xxxxxxxxx wrote:
> > >
> > > > From: Ira Weiny <ira.weiny@xxxxxxxxx>
> > > >
> > > > CXL kernel drivers optionally need to access DOE mailbox capabilities.
> > > > Access to mailboxes for things such as CDAT, SPDM, and IDE are needed by
> > > > the kernel while other access is designed towards user space usage. An
> > > > example of this is for CXL Compliance Testing (see CXL 2.0 14.16.4
> > > > Compliance Mode DOE) which offers a mechanism to set different test
> > > > modes for a device.
> > > >
> > > > There is no anticipated need for the kernel to share an individual
> > > > mailbox with user space. Thus developing an interface to marshal access
> > > > between the kernel and user space for a single mailbox is unnecessary
> > > > overhead. However, having the kernel relinquish some mailboxes to be
> > > > controlled by user space is a reasonable compromise to share access to
> > > > the device.
> > > >
> > > > The auxiliary bus provides an elegant solution for this. Each DOE
> > > > capability is given its own auxiliary device. This device is controlled
> > > > by a kernel driver by default which restricts access to the mailbox.
> > > > Unbinding the driver from a single auxiliary device (DOE mailbox
> > > > capability) frees the mailbox for user space access. This architecture
> > > > also allows a clear picture on which mailboxes are kernel controlled vs
> > > > not.
> > > >
> > > > Iterate each DOE mailbox capability and create auxiliary bus devices.
> > > > Follow on patches will define a driver for the newly created devices.
> > > >
> > > > sysfs shows the devices.
> > > >
> > > > $ ls -l /sys/bus/auxiliary/devices/
> > > > total 0
> > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.0 -> ../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.0
> > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.1 -> ../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.1
> > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.2 -> ../../../devices/pci0000:35/0000:35:00.0/0000:36:00.0/cxl_pci.doe.2
> > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.3 -> ../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.3
> > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.4 -> ../../../devices/pci0000:35/0000:35:00.0/0000:36:00.0/cxl_pci.doe.4
> > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.5 -> ../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.5
> > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.6 -> ../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.6
> > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.7 -> ../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.7
> > > >
> > > > Signed-off-by: Ira Weiny <ira.weiny@xxxxxxxxx>
> > >
> > > I'm not 100% happy with effectively having one solution for CXL
> > > and probably a different one for DOEs on switch ports
> > > (which I just hacked into a port service driver to convince
> > > myself there was at least one plausible way of doing that) but if
> > > this effectively separates the two discussions then I guess I can
> > > live with it for now ;)
> >
> > I took some time this morning to mull this over and talk to Dan...
> >
> > :-(
> >
> > Truthfully the aux driver does very little except provide a way for admins to
> > trigger the driver to stop/start accessing the Mailbox.
> >
> > I suppose a simple sysfs interface could be done to do the same?
> >
> > I'll let Dan weigh in here.
>
> I wonder if best short term option is to not provide a means of
> removing it at all (separate from the PCI driver that is).
> Then we can take our time to decide on the interface if we ever
> get much demand for one.
>
> >
> > >
> > > Once this is merged we can start the discussion about how to
> > > handle switch ports with DOEs both for CDAT and SPDM.
> >
> > I'm ok with that too. However, I was thinking that this was not a user ABI.
> > But it really is. If user space starts writing script to unbind drivers and
> > then we drop the aux driver support it will break them...
> >
> > >
> > > I'll send out an RFC that is so hideous it will get people to
> > > suggestion how to do it better!
> >
> > I think I'd like to see that.
>
> Fair enough. It may muddy the waters a bit :( I'll send an RFC
> next week. I've not looked at how the CXL region code etc would
> actually get to the latency / bandwidth info from the driver yet
> it just goes as far as reading a CDAT length. I also want to actually
> hook up some decent switch CDAT emulation in the QEMU code
> (right now it's giving the same default table as for a type 3 device).
>
> I just hope we don't bikeshed around the RFC in a fashion that slows
> this series moving forwards.

I think we have time in the sense that the worst that happens is that
tooling picks the wrong CFMWS to dynamically create a region and the
performance ends up being sub-optimal. That's tolerable to work around
in userspace in the near term. I want to get some wider confidence in
the DOE ABI with respect to the known protocols and what to do about
the vendor-specific protocols that may conflict and will be driven
from userspace issued config-cycles. That likely means that no DOE ABI
is the best ABI to start which means not moving forward with
aux-devices so scripts do not become attached to something that is not
fully committed to being carried forward.

I still want to refresh the request_config_region() support for at
least having the kernel warn on userspace conflicting configuration
writes to config areas claimed by a driver.