Re: [PATCH] perf/x86/intel/uncore: Fix NULL pointer dereference issue in upi_fill_topology()

From: Alexander Antonov
Date: Tue Nov 21 2023 - 10:12:46 EST



On 11/20/2023 10:21 PM, Liang, Kan wrote:

On 2023-11-20 2:49 p.m., Alexander Antonov wrote:
On 11/15/2023 8:00 PM, Liang, Kan wrote:
On 2023-11-15 10:13 a.m., alexander.antonov@xxxxxxxxxxxxxxx wrote:
From: Alexander Antonov <alexander.antonov@xxxxxxxxxxxxxxx>

The NULL dereference happens inside upi_fill_topology() procedure in
case of disabling one of the sockets on the system.

For example, if you disable the 2nd socket on a 4-socket system then
uncore_max_dies() returns 3 and inside pmu_alloc_topology() memory will
be allocated only for 3 sockets and stored in type->topology.
In discover_upi_topology() memory is accessed by socket id from
CPUNODEID
registers which contain physical ids (from 0 to 3) and on the line:

     upi = &type->topology[nid][idx];

out-of-bound access will happen and the 'upi' pointer will be passed to
upi_fill_topology() where it will be dereferenced.

To avoid this issue update the code to convert physical socket id to
logical socket id in discover_upi_topology() before accessing memory.

Fixes: f680b6e6062e ("perf/x86/intel/uncore: Enable UPI topology
discovery for Icelake Server")
Reported-by: Kyle Meyer <kyle.meyer@xxxxxxx>
Tested-by: Kyle Meyer <kyle.meyer@xxxxxxx>
Signed-off-by: Alexander Antonov <alexander.antonov@xxxxxxxxxxxxxxx>
---
  arch/x86/events/intel/uncore_snbep.c | 10 ++++++++--
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/intel/uncore_snbep.c
b/arch/x86/events/intel/uncore_snbep.c
index 8250f0f59c2b..49bc27ab26ad 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -5596,7 +5596,7 @@ static int discover_upi_topology(struct
intel_uncore_type *type, int ubox_did, i
      struct pci_dev *ubox = NULL;
      struct pci_dev *dev = NULL;
      u32 nid, gid;
-    int i, idx, ret = -EPERM;
+    int i, idx, lgc_pkg, ret = -EPERM;
      struct intel_uncore_topology *upi;
      unsigned int devfn;
  @@ -5614,8 +5614,13 @@ static int discover_upi_topology(struct
intel_uncore_type *type, int ubox_did, i
          for (i = 0; i < 8; i++) {
              if (nid != GIDNIDMAP(gid, i))
                  continue;
+            lgc_pkg = topology_phys_to_logical_pkg(i);
+            if (lgc_pkg < 0) {
+                ret = -EPERM;
+                goto err;
+            }
In the snbep_pci2phy_map_init(), there are similar codes to find the
logical die id. Can we factor a common function for both of them?

Thanks,
Kan
Hi Kan,

Thank you for your comment.
Yes, I think we can factor out the common loop where GIDNIDMAP is being
checked.
But inside snbep_pci2phy_map_init() we have a bit different procedure which
also does the following:

if (topology_max_die_per_package() > 1)
    die_id = i;

I think that having this code, at least, in our case could bring us to the
same issue which we are trying to fix. But of course we could
parametrize this checking.
The topology_max_die_per_package() > 1 means there are more that 1 die
in a socket. AFAIK, it only happens on the Cascade Lake AP.

Did you observe it in the ICX?

Thanks,
Kan
No, I didn't observe it on ICX. Seems for now we have it only on CLX-AP

Thanks,
Alexander

What do you think?

Thanks,
Alexander
              for (idx = 0; idx < type->num_boxes; idx++) {
-                upi = &type->topology[nid][idx];
+                upi = &type->topology[lgc_pkg][idx];
                  devfn = PCI_DEVFN(dev_link0 + idx,
ICX_UPI_REGS_ADDR_FUNCTION);
                  dev =
pci_get_domain_bus_and_slot(pci_domain_nr(ubox->bus),
                                    ubox->bus->number,
@@ -5626,6 +5631,7 @@ static int discover_upi_topology(struct
intel_uncore_type *type, int ubox_did, i
                          goto err;
                  }
              }
+            break;
          }
      }
  err:

base-commit: 9bacdd8996c77c42ca004440be610692275ff9d0