Re: [PATCH 2/2] xen/virtio: Avoid use of the dom0 backend in dom0

From: Juergen Gross
Date: Wed Jul 05 2023 - 00:47:04 EST


On 04.07.23 19:14, Oleksandr Tyshchenko wrote:


On Tue, Jul 4, 2023 at 5:49 PM Roger Pau Monné <roger.pau@xxxxxxxxxx <mailto:roger.pau@xxxxxxxxxx>> wrote:

Hello all.

[sorry for the possible format issues]


On Tue, Jul 04, 2023 at 01:43:46PM +0200, Marek Marczykowski-Górecki wrote:
> Hi,
>
> FWIW, I have ran into this issue some time ago too. I run Xen on top of
> KVM and then passthrough some of the virtio devices (network one
> specifically) into a (PV) guest. So, I hit both cases, the dom0 one and
> domU one. As a temporary workaround I needed to disable
> CONFIG_XEN_VIRTIO completely (just disabling
> CONFIG_XEN_VIRTIO_FORCE_GRANT was not enough to fix it).
> With that context in place, the actual response below.
>
> On Tue, Jul 04, 2023 at 12:39:40PM +0200, Juergen Gross wrote:
> > On 04.07.23 09:48, Roger Pau Monné wrote:
> > > On Thu, Jun 29, 2023 at 03:44:04PM -0700, Stefano Stabellini wrote:
> > > > On Thu, 29 Jun 2023, Oleksandr Tyshchenko wrote:
> > > > > On 29.06.23 04:00, Stefano Stabellini wrote:
> > > > > > I think we need to add a second way? It could be anything that
can help
> > > > > > us distinguish between a non-grants-capable virtio backend and a
> > > > > > grants-capable virtio backend, such as:
> > > > > > - a string on xenstore
> > > > > > - a xen param
> > > > > > - a special PCI configuration register value
> > > > > > - something in the ACPI tables
> > > > > > - the QEMU machine type
> > > > >
> > > > >
> > > > > Yes, I remember there was a discussion regarding that. The point
is to
> > > > > choose a solution to be functional for both PV and HVM *and* to
be able
> > > > > to support a hotplug. IIRC, the xenstore could be a possible
candidate.
> > > >
> > > > xenstore would be among the easiest to make work. The only downside is
> > > > the dependency on xenstore which otherwise virtio+grants doesn't have.
> > >
> > > I would avoid introducing a dependency on xenstore, if nothing else we
> > > know it's a performance bottleneck.
> > >
> > > We would also need to map the virtio device topology into xenstore, so
> > > that we can pass different options for each device.
> >
> > This aspect (different options) is important. How do you want to pass
virtio
> > device configuration parameters from dom0 to the virtio backend domain? You
> > probably need something like Xenstore (a virtio based alternative like
virtiofs
> > would work, too) for that purpose.
> >
> > Mapping the topology should be rather easy via the PCI-Id, e.g.:
> >
> > /local/domain/42/device/virtio/0000:00:1c.0/backend
>
> While I agree this would probably be the simplest to implement, I don't
> like introducing xenstore dependency into virtio frontend either.
> Toolstack -> backend communication is probably easier to solve, as it's
> much more flexible (could use qemu cmdline, QMP, other similar
> mechanisms for non-qemu backends etc).

I also think features should be exposed uniformly for devices, it's at
least weird to have certain features exposed in the PCI config space
while other features exposed in xenstore.

For virtio-mmio this might get a bit confusing, are we going to add
xenstore entries based on the position of the device config mmio
region?

I think on Arm PCI enumeration is not (usually?) done by the firmware,
at which point the SBDF expected by the tools/backend might be
different than the value assigned by the guest OS.

I think there are two slightly different issues, one is how to pass
information to virtio backends, I think doing this initially based on
xenstore is not that bad, because it's an internal detail of the
backend implementation. However passing information to virtio
frontends using xenstore is IMO a bad idea, there's already a way to
negotiate features between virtio frontends and backends, and Xen
should just expand and use that.



On Arm with device-tree we have a special bindings which purpose is to inform us whether we need to use grants for virtio and backend domid for a particular device.Here on x86, we don't have a device tree, so cannot (easily?) reuse this logic.

I have just recollected one idea suggested by Stefano some time ago [1]. The context of discussion was about what to do when device-tree and ACPI cannot be reused (or something like that).The idea won't cover virtio-mmio, but I have heard that virtio-mmio usage with x86 Xen is rather unusual case.

I will paste the text below for convenience.

**********

Part 1 (intro):

We could reuse a PCI config space register to expose the backend id.
However this solution requires a backend change (QEMU) to expose the
backend id via an emulated register for each emulated device.

To avoid having to introduce a special config space register in all
emulated PCI devices (virtio-net, virtio-block, etc) I wonder if we
could add a special PCI config space register at the emulated PCI Root
Complex level.

Basically the workflow would be as follow:

- Linux recognizes the PCI Root Complex as a Xen PCI Root Complex
- Linux writes to special PCI config space register of the Xen PCI Root
  Complex the PCI device id (basically the BDF)
- The Xen PCI Root Complex emulated by Xen answers by writing back to
  the same location the backend id (domid of the backend)
- Linux reads back the same PCI config space register of the Xen PCI
  Root Complex and learn the relevant domid

Part 2 (clarification):

I think using a special config space register in the root complex would
not be terrible in terms of guest changes because it is easy to
introduce a new root complex driver in Linux and other OSes. The root
complex would still be ECAM compatible so the regular ECAM driver would
still work. A new driver would only be necessary if you want to be able
to access the special config space register.


**********
What do you think about it? Are there any pitfalls, etc? This also requires system changes, but at least without virtio spec changes.

[1] https://lore.kernel.org/xen-devel/alpine.DEB.2.22.394.2210061747590.3690179@ubuntu-linux-20-04-desktop/ <https://lore.kernel.org/xen-devel/alpine.DEB.2.22.394.2210061747590.3690179@ubuntu-linux-20-04-desktop/>

Sounds like a good idea. There would be one PCI root per backend domain needed,
but that should be possible.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature