Re: [PATCH v3 1/1] vfio/nvgpu: Add vfio pci variant module for grace hopper

From: Jason Gunthorpe
Date: Tue Jun 06 2023 - 13:16:53 EST


On Tue, Jun 06, 2023 at 11:05:10AM -0600, Alex Williamson wrote:

> It actually seems more complicated this way. We're masquerading this
> region as a BAR, but then QEMU needs to know based on device IDs that
> it's really not a BAR, it has special size properties, mapping
> attributes, error handling, etc.

This seems like something has gone wrong then. ie the SIGUBS error
handling stuff should be totally generic in the qemu side. Mapping
attributes are set by the kernel, qemu shouldn't know, doesn't need to
know.

The size issue is going to a be a problem in future anyhow, I expect
some new standards coming to support non-power-two sizes and they will
want to map to PCI devices in VMs still.

It seems OK to me if qemu can do this generically for any "BAR"
region, at least creating an entire "nvidia only" code path just for
non power 2 BAR sizing seems like a bad ABI choice.

> I'm not privy to a v1, the earliest I see is this (v3):
>
> https://lore.kernel.org/all/20230405180134.16932-1-ankita@xxxxxxxxxx/
>
> That outlines that we have a proprietary interconnect exposing cache
> coherent memory which requires use of special mapping attributes vs a
> standard PCI BAR and participates in ECC. All of which seems like it
> would be easier to setup in QEMU if the vfio-pci representation of the
> device didn't masquerade this regions as a standard BAR. In fact it
> also reminds me of NVlink2 coherent RAM on POWER machines that was
> similarly handled as device specific regions.

It wasn't so good on POWER and if some of that stuff has been done
more generally we would have been further ahead here..

Jason