Re: [PATCH RFC 1/1] x86: fix bad memory access in fb_is_primary_device()

From: Peter Jones
Date: Tue Feb 16 2016 - 10:19:10 EST


On Tue, Feb 16, 2016 at 01:49:18PM +0000, Matt Fleming wrote:
> [ Including Peter, the efifb maintainer. Original email is here,
>
> http://marc.info/?l=linux-kernel&m=145552936131335&w=2
>
> I've snipped some of the quoted text ]
>
> On Tue, 16 Feb, at 08:55:22AM, Ingo Molnar wrote:
> >
> > (I've Cc:-ed the EFI-FB and FB gents. Mail quoted below.)
> >
> > * Alexander Popov <alpopov@xxxxxxxxxxxxxx> wrote:
> >
> > > Currently the code in fb_is_primary_device() contains to_pci_dev() macro
> > > which is applied to dev from struct fb_info. In some cases this causes
> > > bad memory access when fb_is_primary_device() handles fb_info of efifb.
> > > The reason is that fb dev of efifb is embedded into struct platform_device
> > > but not into struct pci_dev.
> > >
> > > We can fix this by checking fb dev bus name in fb_is_primary_device().
> > >
> > > It seems that this bug reveals some bigger problem with to_pci_dev(),
> > > to_platform_device() and others, which just do container_of() and
> > > don't check whether struct device is a part of the appropriate structure.
> > > Should we do something more about it?
> > >
> > > KASan report:
>
> [...]
>
> > >
> > > Signed-off-by: Alexander Popov <alpopov@xxxxxxxxxxxxxx>
> > > ---
> > > arch/x86/video/fbdev.c | 9 +++++----
> > > 1 file changed, 5 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/arch/x86/video/fbdev.c b/arch/x86/video/fbdev.c
> > > index d5644bb..4999f78 100644
> > > --- a/arch/x86/video/fbdev.c
> > > +++ b/arch/x86/video/fbdev.c
> > > @@ -18,11 +18,12 @@ int fb_is_primary_device(struct fb_info *info)
> > > struct pci_dev *default_device = vga_default_device();
> > > struct resource *res = NULL;
> > >
> > > - if (device)
> > > - pci_dev = to_pci_dev(device);
> > > -
> > > - if (!pci_dev)
> > > + if (!device || !device->bus ||
> > > + !device->bus->name || strcmp(device->bus->name, "pci")) {
> > > return 0;
> > > + }
> > > +
> > > + pci_dev = to_pci_dev(device);
> > >
> > > if (default_device) {
> > > if (pci_dev == default_device)
> > > --
> > > 1.9.1
> > >
>
> I wonder if this issue could explain some of the efifb issues we've
> seen reported on bugzilla.kernel.org in the past where switching from
> efifb to some other framebuffer device caused hangs during boot. I'm
> struggling to find the relevant bugzilla entries now, though.

It's possible it could, but I don't have them handy either. I've also
wondered if some of them were due to bad data from the firmware - at
plugfests we've seen some cases where the actual video mode as measured
with a ruler is clearly not what the firmware claims it to be, so it's
entirely possible we're occasionally told a memory region that is not
what's actually mapped, or that's mapped but is only partially backed
by the actual frame buffer memory.

But aside from that diversion, I think Alexander has a legitimate
question about use of to_pci_dev(). If I ask the question: can we fix
this in efifb by making it live on a pci_dev, I have a couple of
fundamental problems:

1) technically it doesn't have to be a pci_dev at all (but, practically,
so far it always is on PCI...)
2) From EFI, we can't necessarily pin it down to a single PCI device
even if it is PCI. Before we do EFI's ExitBootServices() call, we
can try to find the PCI_IO handle our GOP instance is connected to,
but not all firmware GOP drivers use that, so it doesn't always work.
And even if it did, there can be more than one instance pointing to
the same memory with different PCI devices - lots of laptops have
this sort of thing.
3) Ignoring the EFI side and just focusing on PCI, if there's two
devices configured that could do scanout, it can be mapped to one
device's BAR but the other device be the actual device using it. In
this case either choice is probably wrong for something, and the
things that have the information to resolve which one don't include
efifb - they're the drivers we'll likely hand off to later.

So it's most likely right for efifb to be embedded in a platform_device
instead of a pci_dev. Which leads back to Alexander's question - if it
isn't in a pci_dev, that means fb_is_primary_device() needs to not
assume it is. So the patch appears correct, but so is the question -
should to_pci_dev() be checking this and returning NULL here?

--
Peter