Re: [RFC PATCH 1/5] nvme-pci: add function nvme_submit_vf_cmd to issue admin commands for VF driver.

From: Jason Gunthorpe
Date: Tue Dec 06 2022 - 10:22:48 EST


On Tue, Dec 06, 2022 at 02:58:10PM +0100, Christoph Hellwig wrote:

> Most importantly NVMe is very quiet on the relationship between
> VFs and PFs, and there is no way to guarantee that a PF is, at the
> NVMe level, much in control of a VF at all. In other words this
> concept really badly breaks NVMe abstractions.

Yeah, I think the spec effort is going to be interesting for sure.

>From a pure Linux and implementation perspective a decision must be
made early on how to label the DMAs for kernel/qemu vs VM controlled
items at the PCI TLP level.

> controlled functions (which could very well be, and in some designs
> are, additional PFs and not VFs) by controlling function.

In principle PF vs VF doesn't matter much - the question is really TLP
labeling. If the spec says RID A is the controlling RID and RID B is
the guest RID, then it doesn't matter if they have a PF/VF
relationship or PF/PF relationship.

We have locking issues in Linux SW connecting different SW drivers for
things that are not a PF/VF relationship, but perhaps that can be
solved.

Using VF RID / VF PASID is appealing at first glance, but there is
list of PCI emulation details that have to be worked out for that to
be good. eg what do you do with guest triggered FLR? Or guest
triggered memory disable? How do you handle PCIe AER? Also lack of
PASID support in CPUs is problematic.

Lots of trade offs..

Jason