Re: [regression] PCI early boot hang on certain AMD systems

From: Christian KÃnig
Date: Wed Dec 06 2017 - 12:59:04 EST


This is a multi-part message in MIME format. Hi Ingo,

known issue with multi socket systems and the patch in question.

The attached set of patches should fix the issue and are already send to Bjorn for inclusion in the next rc.

Sorry for the noise,
Christian.

Am 06.12.2017 um 17:16 schrieb Ingo Molnar:
Hi,

* Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:

PCI changes:
Christian König (4):
x86/PCI: Enable a 64bit BAR on AMD Family 15h (Models 00-1f, 30-3f, 60-7f)
In v4.15 one of my test systems broke, it hangs in early bootup, during early PCI
setup:

[ 2.262005] pci 0000:00:18.1: adding root bus resource [mem 0x1027000000-0xfcffffffff 64bit pref window] <--- new resource
[ 2.270081] pci 0000:00:18.2: [1022:1602] type 00 class 0x060000
[ 2.271081] pci 0000:00:18.3: [1022:1603] type 00 class 0x060000
[ 2.272083] pci 0000:00:18.4: [1022:1604] type 00 class 0x060000
[ 2.273079] pci 0000:00:18.5: [1022:1605] type 00 class 0x060000
[ 2.274083] pci 0000:00:19.0: [1022:1600] type 00 class 0x060000
[ 2.275089] pci 0000:00:19.1: [1022:1601] type 00 class 0x060000
[ hard hang ]

I have bisected the hang to:

fa564ad96366: x86/PCI: Enable a 64bit BAR on AMD Family 15h (Models 00-1f, 30-3f, 60-7f)

Reverting the commit makes the system boot again. The 'new resource' line above is
I believe the new BAR added by the commit.

I've attached the earlyprintk boot log of the hang, with a few printks added to
pci_amd_enable_64bit_bar() of the relevant fields:

+ printk("res->start: %016llx\n", res->start);
+ printk("res->end: %016llx\n", res->end);
+ printk("base: %08x\n", base);
+ printk("high: %08x\n", high);
+ printk("limit: %08x\n", limit);
+ printk("slot: %d\n", i);

[ 2.261090] pci 0000:00:18.1: [1022:1601] type 00 class 0x060000
[ 2.262005] pci 0000:00:18.1: adding root bus resource [mem 0x1027000000-0xfcffffffff 64bit pref window]
[ 2.264001] res->start: 0000001027000000
[ 2.265001] res->end: 000000fcffffffff
[ 2.266001] base: 10270003
[ 2.267001] high: 00000000
[ 2.268001] limit: fd000000
[ 2.269001] slot: 1
[ 2.270081] pci 0000:00:18.2: [1022:1602] type 00 class 0x060000
[ 2.271081] pci 0000:00:18.3: [1022:1603] type 00 class 0x060000
[ 2.272083] pci 0000:00:18.4: [1022:1604] type 00 class 0x060000
[ 2.273079] pci 0000:00:18.5: [1022:1605] type 00 class 0x060000
[ 2.274083] pci 0000:00:19.0: [1022:1600] type 00 class 0x060000
[ 2.275089] pci 0000:00:19.1: [1022:1601] type 00 class 0x060000

On a sucessful bootup the system would continue with:

[ 0.583060] pci 0000:00:19.2: [1022:1602] type 00 class 0x060000
[ 0.584079] pci 0000:00:19.3: [1022:1603] type 00 class 0x060000
[ 0.585084] pci 0000:00:19.4: [1022:1604] type 00 class 0x060000
[ 0.586079] pci 0000:00:19.5: [1022:1605] type 00 class 0x060000
[ 0.588039] pci 0000:00:1a.0: [1022:1600] type 00 class 0x060000
[ 0.589090] pci 0000:00:1a.1: [1022:1601] type 00 class 0x060000
[ 0.590079] pci 0000:00:1a.2: [1022:1602] type 00 class 0x060000
[ 0.591080] pci 0000:00:1a.3: [1022:1603] type 00 class 0x060000
[ 0.593006] pci 0000:00:1a.4: [1022:1604] type 00 class 0x060000
[ 0.594079] pci 0000:00:1a.5: [1022:1605] type 00 class 0x060000
[ 0.595082] pci 0000:00:1b.0: [1022:1600] type 00 class 0x060000
[ 0.596087] pci 0000:00:1b.1: [1022:1601] type 00 class 0x060000
[ 0.597083] pci 0000:00:1b.2: [1022:1602] type 00 class 0x060000
[ 0.598080] pci 0000:00:1b.3: [1022:1603] type 00 class 0x060000
[ 0.599085] pci 0000:00:1b.4: [1022:1604] type 00 class 0x060000
[ 0.600079] pci 0000:00:1b.5: [1022:1605] type 00 class 0x060000
[ 0.601124] pci 0000:03:00.0: [1000:0072] type 00 class 0x010700
[ 0.602037] pci 0000:03:00.0: reg 0x10: [io 0xe000-0xe0ff]
[ 0.603010] pci 0000:03:00.0: reg 0x14: [mem 0xdff3c000-0xdff3ffff 64bit]
[ 0.604009] pci 0000:03:00.0: reg 0x1c: [mem 0xdff40000-0xdff7ffff 64bit]
[ 0.605011] pci 0000:03:00.0: reg 0x30: [mem 0xdff80000-0xdfffffff pref]
...

cpuinfo:

processor : 31
vendor_id : AuthenticAMD
cpu family : 21
model : 1
model name : AMD Opteron(tm) Processor 6278
stepping : 2
microcode : 0x6000626
cpu MHz : 1427.124
cache size : 2048 KB
physical id : 1
siblings : 16
core id : 7
cpu cores : 8

board:

Manufacturer: Supermicro
Product Name: H8DG6/H8DGi

BIOS:

Vendor: American Megatrends Inc.
Version: 2.0b
Release Date: 03/01/2012

I've attached the lspci -v output and a successful full bootlog as well, with
various debugging options enabled. Let me know if you need any other info.

Thanks,

Ingo