Re: [PATCH 0/2] Improve vfio-pci primary GPU assignment behavior

From: Javier Martinez Canillas
Date: Wed Jun 08 2022 - 05:26:38 EST


Hello Gerd and Alex,

On 6/8/22 09:43, Gerd Hoffmann wrote:
> Hi,
>
>> But also, this issue isn't something that only affects graphic devices,
>> right? AFAIU from [1] and [2], the same issue happens if a PCI device
>> has to be bound to vfio-pci but already was bound to a host driver.
>
> Nope. There is a standard procedure to bind and unbind pci drivers via
> sysfs, using /sys/bus/pci/drivers/$name/{bind,unbind}.
>

Yes, but the cover letter says:

"Users often employ kernel command line arguments to disable conflicting
drivers or perform unbinding in userspace to avoid this"

So I misunderstood that the goal was to avoid the need to do this via sysfs
in user-space. I understand now that the problem is that for real PCI devices
bound to a driver, you know the PCI device ID and bus so that you can use it,
but with platform devices bound to drivers that just use a firmware-provided
framebuffers you don't have that information to unbound.

Because you could use the standard sysfs bind/unbind interface for this too,
but don't have a way to know if the "simple-framebuffer" or "efi-framebuffer"
is associated with a PCI device that you want to pass through or another one.

The only information that could tell you that is the I/O memory resource that
is associated with the platform device registered and that's why you want to
use the drm_aperture_remove_conflicting_pci_framebuffers() helper.

>> The fact that DRM happens to have some infrastructure to remove devices
>> that conflict with an aperture is just a coincidence.
>
> No. It's a consequence of firmware framebuffers not being linked to the
> pci device actually backing them, so some other way is needed to find
> and solve conflicts.
>

Right, it's clear to me now. As mentioned I misunderstood your problem.

>> The series [0] mentioned above, adds a sysfb_disable() that disables the
>> Generic System Framebuffer logic that is what registers the framebuffer
>> devices that are bound to these generic video drivers. On disable, the
>> devices registered by sysfb are also unregistered.
>
> As Alex already mentioned this might not have the desired effect on
> systems with multiple GPUs (I think even without considering vfio-pci).
>

That's correct, although the firmware framebuffer drivers are just a best
effort to allow having some display output even if there's no real video
driver (or if the user prevented them to load with "nomodeset").

We have talked about improving this, by unifying fbdev and DRM apertures
in a single list that could track all the devices registered and their
requested aperture so that all subsystems could use it. The reason why
I was pushing back on using the DRM aperture helper is that it would
make more complicated later to do this refactoring as more subsystems
use the current API.

But as Alex said, it wouldn't make the problem worse so I'm OK with this
if others agree that's the correct thing to do.

>> That is, do you want to remove the {vesa,efi,simple}fb and simpledrm
>> drivers or is there a need to also remove real fbdev and DRM drivers?
>
> Boot framebuffers are the problem because they are neither visible nor
> manageable in /sys/bus/pci. For real fbdev/drm drivers the standard pci
> unbind can be used.
>

Yes. Honestly I believe all this should be handled by the Linux device model.

That is, drivers could just do pci_request_region() / request_mem_region()
and drivers that want to unbind another bound device could do something like
pci_request_region_force() / request_mem_region_force() to kick them out.

--
Best regards,

Javier Martinez Canillas
Linux Engineering
Red Hat