Re: [RFC PATCH 1/5] nvme-pci: add function nvme_submit_vf_cmd to issue admin commands for VF driver.

From: Jason Gunthorpe
Date: Tue Dec 06 2022 - 10:51:39 EST


On Tue, Dec 06, 2022 at 04:38:11PM +0100, Christoph Hellwig wrote:

> > We have locking issues in Linux SW connecting different SW drivers for
> > things that are not a PF/VF relationship, but perhaps that can be
> > solved.
>
> And I think the only reasonable answer is that the entire workflow
> must be 100% managed from the controlling function, and the controlled
> function is just around for a ride, with the controlling function
> enabling/disabling it as needed without ever interacting with software
> that directly deals with the controlled function.

That is a big deviation from where VFIO is right now, the controlled
function is the one with the VFIO driver, it should be the one that
drives the migration uAPI components.

Adding another uAPI that can manipulate the same VFIO device from some
unrelated chardev feels wrong.

There are certain things that need to be co-ordinated for eveything to
work. Like you can't suspend the VFIO device unless you promise to
also stop MMIO operations. Stuff like FLR interfers with the migration
operation and has to be co-ordinated. Some migration operation
failures, like load failure, have to be healed through FLR.

It really does not want to be two different uAPIs even if that is
convenient for the kernel.

I'd much rather try to fix the problems PASID brings that try to make
this work :\

Jason