Re: perf/x86/intel/uncore

From: Liang, Kan
Date: Fri Jan 25 2019 - 16:35:05 EST




On 1/25/2019 3:16 PM, Song Liu wrote:
Thanks Kan!

On Jan 25, 2019, at 12:08 PM, Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:



On 1/25/2019 1:54 PM, Song Liu wrote:
Hi,
We are debugging an issue that skx_pci_uncores cannot be registered on
8-socket system with Xeon Platinum 8176 CPUs. After poking around for a
while, I found it is caused by snbep_pci2phy_map_init() couldn't find
a unbox_dev:
ubox_dev = pci_get_device(PCI_VENDOR_ID_INTEL, devid, ubox_dev);
unbox_dev == NULL
...
The same kernel (Linus' master) works fine on some single socket SKX
systems.
I am not sure what to check next. And I am not sure whether this is
specific to this system (HPE Superdome Flex).

Could you please share the offset 0xC0 and 0xD4 of the PCI configuration space for each device which PCI ID is 0x2014?

snbep_pci2phy_map_init() tries to build a mapping from BUS# to Socket ID.
CPUNODEID (0xc0) discloses the Node ID of current BUS.
GIDNIDMAP (0xd4) discloses the mapping between Socket ID and Node ID.

Here is an example from a 4 socket SKX.
BUS CPUNODEID(bit2:0) GIDNIDMAP
0x0 0x0 0x688
0x40 0x1 0x688
0x80 0x2 0x688
0xC0 0x3 0x688


Here is the data I get:

# lspci -xxx | grep "86 80 14 20" -A 15 -B 1 | grep -e "86 80 14 20" -e c0: -e d0: -e Intel
0000:00:08.0 System peripheral: Intel Corporation Sky Lake-E Ubox Registers (rev 04)
00: 86 80 14 20 00 00 10 00 04 00 80 08 00 00 80 00
c0: 00 a0 00 00 2f 00 00 80 01 00 02 00 2f 2f 2f 20
d0: 02 00 00 00 88 d6 b6 00 01 00 00 00 00 00 00 00

0001:00:08.0 System peripheral: Intel Corporation Sky Lake-E Ubox Registers (rev 04)
00: 86 80 14 20 00 00 10 00 04 00 80 08 00 00 80 00
c0: 01 80 00 00 1f 00 00 80 01 00 02 00 1f 1f 1f 10
d0: 02 00 00 00 88 46 92 00 01 00 00 00 00 00 00 00

0002:00:08.0 System peripheral: Intel Corporation Sky Lake-E Ubox Registers (rev 04)
00: 86 80 14 20 00 00 10 00 04 00 80 08 00 00 80 00
c0: 02 e0 00 00 8f 00 00 80 01 00 02 00 8f 8f 8f 80
d0: 02 00 00 00 88 f6 ff 00 01 00 00 00 00 00 00 00

0003:00:08.0 System peripheral: Intel Corporation Sky Lake-E Ubox Registers (rev 04)
00: 86 80 14 20 00 00 10 00 04 00 80 08 00 00 80 00
c0: 03 c0 00 00 4f 00 00 80 01 00 02 00 4f 4f 4f 40
d0: 02 00 00 00 88 66 db 00 01 00 00 00 00 00 00 00

0004:00:08.0 System peripheral: Intel Corporation Sky Lake-E Ubox Registers (rev 04)
00: 86 80 14 20 00 00 10 00 04 00 80 08 00 00 80 00
c0: a0 b4 00 00 2f 00 00 80 01 00 02 00 2f 2f 2f 20

The local node ID should be bit2:0. We didn't mask it in our codes.
Does the patch as below work?

diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
index c07bee3..15a8e3c 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -1222,6 +1222,8 @@ static struct pci_driver snbep_uncore_pci_driver = {
.id_table = snbep_uncore_pci_ids,
};

+#define NODE_ID_MASK 0x7
+
/*
* build pci bus to socket mapping
*/
@@ -1243,7 +1245,7 @@ static int snbep_pci2phy_map_init(int devid, int nodeid_loc, int idmap_loc, bool
err = pci_read_config_dword(ubox_dev, nodeid_loc, &config);
if (err)
break;
- nodeid = config;
+ nodeid = config & NODE_ID_MASK;
/* get the Node ID mapping */
err = pci_read_config_dword(ubox_dev, idmap_loc, &config);
if (err)


Thanks,
Kan

d0: 02 00 00 00 6d 8b 68 00 01 00 00 00 00 00 00 00

0005:00:08.0 System peripheral: Intel Corporation Sky Lake-E Ubox Registers (rev 04)
00: 86 80 14 20 00 00 10 00 04 00 80 08 00 00 80 00
c0: 81 90 00 00 1f 00 00 80 01 00 02 00 1f 1f 1f 10
d0: 02 00 00 00 24 89 68 00 01 00 00 00 00 00 00 00

0006:00:08.0 System peripheral: Intel Corporation Sky Lake-E Ubox Registers (rev 04)
00: 86 80 14 20 00 00 10 00 04 00 80 08 00 00 80 00
c0: e2 fc 00 00 8f 00 00 80 01 00 02 00 8f 8f 8f 80
d0: 02 00 00 00 ff 8f 68 00 01 00 00 00 00 00 00 00

0007:00:08.0 System peripheral: Intel Corporation Sky Lake-E Ubox Registers (rev 04)
00: 86 80 14 20 00 00 10 00 04 00 80 08 00 00 80 00
c0: c3 d8 00 00 4f 00 00 80 01 00 02 00 4f 4f 4f 40
d0: 02 00 00 00 b6 8d 68 00 01 00 00 00 00 00 00 00

Song

One thing I noticed is that the PCI configuration space shows
subsystem vendor ID of 0x1590 instead of 0x8086:
0000:00:08.0 System peripheral: Intel Corporation Sky Lake-E Ubox Registers (rev 04)
00: 86 80 14 20 00 00 10 00 04 00 80 08 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 90 15 14 20 << subsystem vendor
30: 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00
But I don't think that is the problem as the code search with PCI_ANY_ID.


It looks for the device with PCI ID 0x2014.


Thanks,
Kan