Re: [PATCH 2/2] perf/x86/intel/uncore: With > 8 nodes, get pci bus die id from NUMA info

From: Peter Zijlstra
Date: Mon Jan 11 2021 - 08:02:15 EST


On Fri, Jan 08, 2021 at 09:35:49AM -0600, Steve Wahl wrote:


> + /*
> + * The nodeid and idmap registers only contain enough
> + * information to handle 8 nodes. On systems with more
> + * than 8 nodes, we need to rely on NUMA information,
> + * filled in from BIOS supplied information, to determine
> + * the topology.
> + */

Egads.. do we realy have to trust BIOS data? BIOS crud tends to be
bonghits qualitee :/

> + if (nr_node_ids <= 8) {
> + /* get the Node ID of the local register */
> + err = pci_read_config_dword(ubox_dev, nodeid_loc, &config);
> + if (err)
> + break;
> + nodeid = config & NODE_ID_MASK;
> + /* get the Node ID mapping */
> + err = pci_read_config_dword(ubox_dev, idmap_loc, &config);
> + if (err)
> + break;
>
> + segment = pci_domain_nr(ubox_dev->bus);
> + raw_spin_lock(&pci2phy_map_lock);
> + map = __find_pci2phy_map(segment);
> + if (!map) {
> + raw_spin_unlock(&pci2phy_map_lock);
> + err = -ENOMEM;
> + break;
> + }
> +
> + /*
> + * every three bits in the Node ID mapping register maps
> + * to a particular node.
> + */
> + for (i = 0; i < 8; i++) {
> + if (nodeid == ((config >> (3 * i)) & 0x7)) {
> + if (topology_max_die_per_package() > 1)
> + die_id = i;
> + else
> + die_id = topology_phys_to_logical_pkg(i);
> + map->pbus_to_dieid[bus] = die_id;
> + break;
> + }
> + }
> raw_spin_unlock(&pci2phy_map_lock);
> + } else {
> + int node = pcibus_to_node(ubox_dev->bus);
> + int cpu;
> +
> + segment = pci_domain_nr(ubox_dev->bus);
> + raw_spin_lock(&pci2phy_map_lock);
> + map = __find_pci2phy_map(segment);
> + if (!map) {
> + raw_spin_unlock(&pci2phy_map_lock);
> + err = -ENOMEM;
> + break;
> + }
> + die_id = -1;
> + for_each_cpu(cpu, cpumask_of_pcibus(ubox_dev->bus)) {
> + struct cpuinfo_x86 *c = &cpu_data(cpu);
> +
> + if (c->initialized && cpu_to_node(cpu) == node) {
> + map->pbus_to_dieid[bus] = die_id = c->logical_die_id;
> + break;
> + }
> + }
> + raw_spin_unlock(&pci2phy_map_lock);
> +
> + if (WARN_ON_ONCE(die_id == -1)) {
> + err = -EINVAL;
> break;
> }

This seems to assume a single die per node; is that fundemantally
correct?

Did you consider malicious BIOS data? I think we're good, but I didn't
look too hard.

> }
> }
>
> if (!err) {
> --
> 2.26.2
>