Re: [PATCH 2/2] vfio/pci: Remove console drivers

From: mb@xxxxxxx
Date: Mon Dec 05 2022 - 16:52:52 EST


Hi Thomas,

On Mon, Dec 5, 2022 at 3:11 AM Thomas Zimmermann <tzimmermann@xxxxxxx> wrote:
>
> Hi
>
> Am 05.12.22 um 10:32 schrieb mb@xxxxxxx:
> > I have a rtx 3070 and a 3090, I am absolutely sure I am binding vfio-pci
> > to the 3090 and not the 3070.
> >
> > I have bound the driver in two different ways, first by passing the IDs
> > to the module and alternatively by manipulating the system interface and
> > use the override (this is what I originally had to do when I used two
> > 1080s, so I know it works).
> >
> > While the 3090 doesn't show a console, there's a remnant from the refund
> > (and grub previously) there.
> >
> > The assessment Alex made previously, where
> > aperture_remove_conflicting_pci_devices() is removing the driver (EFIFB)
> > instead of the device seems correct, but it could also can be a quirky
> > of how EFIFB is implemented. I recall reading a long time ago that EFIFB
> > is a special device and once it detects changes it would simply give up.
> > There was also no way to attach a device to it again as it depends on
> > being preloaded outside the kernel; once something takes over the buffer
> > reinitializing is "impossible". I never went deeper to try and
> > understand it.
>
> We recently reworked fbdev's interaction with the aperture helpers. [1]
> All devices should now be removed iff the driver has been bound to it
> (which should be the case here) The patches went into an v6.1-rc.
>
> Could you try the most recent v6.1-rc and report if this fixes the problem?

I just tried the latest one, v6.1-rc8, and I can see all the commits
for the series you mentioned there.

The same freeze behavior happens when I load vfio-pci:

[ 6.525463] VFIO - User Level meta-driver version: 0.3
[ 6.528231] Console: switching to colour dummy device 320x90

--
Carlos

>
> Best regards
> Thomas
>
> [1] https://patchwork.freedesktop.org/series/106040/
>
> >
> >
> > On Mon, Dec 5, 2022, 2:00 AM Thomas Zimmermann <tzimmermann@xxxxxxx
> > <mailto:tzimmermann@xxxxxxx>> wrote:
> >
> > Hi
> >
> > Am 05.12.22 um 01:51 schrieb Alex Williamson:
> > > On Sat, 3 Dec 2022 17:12:38 -0700
> > > "mb@xxxxxxx" <mb@xxxxxxx> wrote:
> > >
> > >> Hi,
> > >>
> > >> I hope it is ok to reply to this old thread.
> > >
> > > It is, but the only relic of the thread is the subject. For
> > reference,
> > > the latest version of this posted is here:
> > >
> > >
> > https://lore.kernel.org/all/20220622140134.12763-4-tzimmermann@xxxxxxx/ <https://lore.kernel.org/all/20220622140134.12763-4-tzimmermann@xxxxxxx/>
> > >
> > > Which is committed as:
> > >
> > > d17378062079 ("vfio/pci: Remove console drivers")
> > >
> > >> Unfortunately, I found a
> > >> problem only now after upgrading to 6.0.
> > >>
> > >> My setup has multiple GPUs (2), and I depend on EFIFB to have a
> > working console.
> >
> > Which GPUs do you have?
> >
> > >> pre-patch behavior, when I bind the vfio-pci to my secondary GPU
> > both
> > >> the passthrough and the EFIFB keep working fine.
> > >> post-patch behavior, when I bind the vfio-pci to the secondary GPU,
> > >> the EFIFB disappears from the system, binding the console to the
> > >> "dummy console".
> >
> > The efifb would likely use the first GPU. And vfio-pci should only
> > remove the generic driver from the second device. Are you sure that
> > you're not somehow using the first GPU with vfio-pci.
> >
> > >> Whenever you try to access the terminal, you have the screen
> > stuck in
> > >> whatever was the last buffer content, which gives the impression of
> > >> "freezing," but I can still type.
> > >> Everything else works, including the passthrough.
> > >
> > > This sounds like the call to
> > aperture_remove_conflicting_pci_devices()
> > > is removing the conflicting driver itself rather than removing the
> > > device from the driver. Is it not possible to unbind the GPU from
> > > efifb before binding the GPU to vfio-pci to effectively nullify the
> > > added call?
> > >
> > >> I can only think about a few options:
> > >>
> > >> - Is there a way to have EFIFB show up again? After all it looks
> > like
> > >> the kernel has just abandoned it, but the buffer is still there. I
> > >> can't find a single message about the secondary card and EFIFB in
> > >> dmesg, but there's a message for the primary card and EFIFB.
> > >> - Can we have a boolean controlling the behavior of vfio-pci
> > >> altogether or at least controlling the behavior of vfio-pci for that
> > >> specific ID? I know there's already some option for vfio-pci and VGA
> > >> cards, would it be appropriate to attach this behavior to that
> > option?
> > >
> > > I suppose we could have an opt-out module option on vfio-pci to skip
> > > the above call, but clearly it would be better if things worked by
> > > default. We cannot make full use of GPUs with vfio-pci if they're
> > > still in use by host console drivers. The intention was certainly to
> > > unbind the device from any low level drivers rather than disable
> > use of
> > > a console driver entirely. DRM/GPU folks, is that possibly an
> > > interface we could implement? Thanks,
> >
> > When vfio-pci gives the GPU device to the guest, which driver driver is
> > bound to it?
> >
> > Best regards
> > Thomas
> >
> > >
> > > Alex
> > >
> >
> > --
> > Thomas Zimmermann
> > Graphics Driver Developer
> > SUSE Software Solutions Germany GmbH
> > Maxfeldstr. 5, 90409 Nürnberg, Germany
> > (HRB 36809, AG Nürnberg)
> > Geschäftsführer: Ivo Totev
> >
>
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Ivo Totev