Re: [PATCH v3 14/30] vfio/pci: re-introduce CONFIG_VFIO_PCI_ZDEV

From: Matthew Rosato
Date: Mon Feb 07 2022 - 15:12:46 EST


On 2/7/22 12:59 PM, Cornelia Huck wrote:
On Mon, Feb 07 2022, Matthew Rosato <mjrosato@xxxxxxxxxxxxx> wrote:

On 2/7/22 3:35 AM, Cornelia Huck wrote:
On Fri, Feb 04 2022, Matthew Rosato <mjrosato@xxxxxxxxxxxxx> wrote:

This was previously removed as unnecessary; while that was true, subsequent
changes will make KVM an additional required component for vfio-pci-zdev.
Let's re-introduce CONFIG_VFIO_PCI_ZDEV as now there is actually a reason
to say 'n' for it (when not planning to CONFIG_KVM).

Hm... can the file be split into parts that depend on KVM and parts that
don't? Does anybody ever use vfio-pci on a non-kvm s390 system?


It is possible to split out most of the prior CLP/ vfio capability work
(but it would not be a totally clean split, zpci_group_cap for example
would need to have an inline ifdef since it references a KVM structure)
-- I suspect we'll see more of that in the future.
I'm not totally sure if there's value in the information being provided
today -- this CLP information was all added specifically with
userspace->guest delivery in mind. And to answer your other question,
I'm not directly aware of non-kvm vfio-pci usage on s390 today; but that
doesn't mean there isn't any or won't be in the future of course. With
this series, you could CONFIG_KVM=n + CONFIG_VFIO_PCI=y|m and you'll get
the standard vfio-pci support but never any vfio-pci-zdev extension.

Yes. Remind me again: if you do standard vfio-pci without the extensions
grabbing some card-specific information and making them available to the
guest, you get a working setup, it just always looks like a specific
card, right?


That's how QEMU treats it anyway, yes. Standard PCI aspects (e.g. config space) are fine, but for the s390-specific bits we end up making generalizations / using hard-coded values that are subsequently shared with the guest when it issues a CLP -- these bits are used to identify various s390-specific capabilities of the device (an example: based upon the function type, the guest could derive what format of the function measurement block can be used. The hard-coded value is otherwise effectively 'generic device' so use the basic format for this block).

Basically, we are using vfio to transmit information owned by the host s390 PCI layer to, ultimately, the guest s390 PCI layer (modified to reflect what kvm+QEMU supports), so that the guest can treat the device the same way that the host does. Anything else in between isn't going to be interested in that information unless it wants to do something very s390-specific.


If we wanted to provide everything we can where KVM isn't strictly
required, then let's look at what a split would look like:

With or without KVM:
zcpi_base_cap
zpci_group_cap (with an inline ifdef for KVM [1])
zpci_util_cap
zpci_pfip_cap
vfio_pci_info_zdev_add_caps
vfio_pci_zdev_open (ifdef, just return when !KVM [1])
vfio_pci_zdev_release (ifdef, just return when !KVM [1])

KVM only:
vfio_pci_zdev_feat_interp
vfio_pci_zdev_feat_aif
vfio_pci_zdev_feat_ioat
vfio_pci_zdev_group_notifier

I suppose such a split has the benefit of flexibility /
future-proofing... should a non-kvm use case arrive in the future for
s390 and we find we need some s390-specific handling, we're still
building vfio-pci-zdev into vfio-pci by default and can just extend that.

[1] In this case I would propose renaming CONFIG_VFIO_PCI_ZDEV as we
would once again always be building some part of vfio-pci-zdev with
vfio-pci on s390 -- maybe something like CONFIG_VFIO_PCI_ZDEV_KVM (wow
that's a mouthful) and then use this setting to check "KVM" in my above
split. Since this setting will imply PCI, VFIO_PCI and KVM, we can then
s/CONFIG_VFIO_PCI_ZDEV/CONFIG_VFIO_PCI_ZDEV_KVM/ for the rest of the
series (to continue covering cases like we build KVM but not pci, or not
vfio-pci)

How does that sound?

Complex :)

I'm not really sure whether it's worth the hassle on an odd chance that
we may want it for a !KVM usecase in the future (that goes beyond the
"base" vfio-pci support.) OTOH, it would be cleaner. I'm a bit torn on
this one.


Well, another option would be to move ahead with this patch as-is, except to rename s/CONFIG_VFIO_PCI_ZDEV/CONFIG_VFIO_PCI_ZDEV_KVM/ or something like that (and naturally tweak the title and commit message a bit). Basically, don't have the name imply a 1:1 relationship with all of vfio-pci-zdev, even if it will have that effect in practice for now.

Net result with this series would be we stop building vfio-pci-zdev without KVM, which means we remove the zdev CLP capabilities when !KVM. And then if we have a !KVM usecase in the future that needs something non-standard for s390 (either this CLP info or more likely some other s390-specific tweak) we can then perform the split, perhaps just as I describe above. In this way we punt the need for complexity until a point when (if) we need it, without backing ourselves into a weird case where we must rename the config parameter (or live with the fact that we always build some part of vfio-pci-zdev even when CONFIG_VFIO_PCI_ZDEV=n)