Re: [PATCH] PCI: avoid NULL deref in alloc_pcie_link_state

From: Bjorn Helgaas
Date: Tue Jun 25 2013 - 13:18:11 EST


On Tue, Jun 25, 2013 at 5:23 AM, Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
> On Mon, Jun 24, 2013 at 07:38:45PM -0600, Bjorn Helgaas wrote:
>> [+cc Michael, Alex, Isaku]
>>
>> On Wed, Jun 19, 2013 at 12:56 PM, Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx> wrote:
>> > PCIe switch upstream port can be connected directly to the PCIe root bus
>> > in QEMU; ASPM does not expect this topology and dereferences NULL pointer
>> > when initializing.
>> >
>> > I have not confirmed this can happen on real hardware, but it is presented
>> > as a feature in QEMU, so there is no reason to panic if we can recover.
>>
>> This doesn't seem like a valid hardware topology to me. If this *can*
>> occur on real hardware, we should fix it in Linux. If not, maybe QEMU
>> should be changed to disallow it.
>
> I don't think it's a spec compliant topology either.
>
> Anything connected to an RC is either an integrated endpoint
> or a root port.
> There's no way to have an upstream port that is also
> an integrated endpoint or a root port - these are distinct
> device types.
>
> So I don't think Linux needs to support it.
>
> Having said that, there's all kind of broken hardware
> out there - crashing is not a friendly way to tell users
> that their hardware is not spec compliant.
> Maybe linux can print a friendly warning and ignore
> this port?

Indeed, that would be nicer. I booted Win7 on the same config, and it
came up fine. It did complain that it couldn't start the PCI-to-PCI
bridge driver on 00:03.0, the upstream port.

I'd like it if Linux could similarly tolerate that. But since this is
a relatively low-priority issue, I'm going to hold out for a more
straightforward solution. Checking for a null
"pdev->bus->parent->self" pointer is not very obvious. I think we can
probably come up with a more direct check that reads better and
possibly even detects the issue at the upstream port, not the
downstream port.

I opened a bugzilla for this issue:
https://bugzilla.kernel.org/show_bug.cgi?id=60111

Radim, can you please attach a complete dmesg log showing the oops,
i.e., console output with "ignore_loglevel" and lspci -vv output (have
to use your patch so it boots, I guess)? I tried to reproduce it and
I know the problem is there, but I haven't quite found the right
recipe yet.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/