Re: [PATCH] perf/x86/intel/uncore: Fix NULL pointer dereference issue in upi_fill_topology()

From: Liang, Kan
Date: Mon Nov 20 2023 - 16:22:15 EST




On 2023-11-20 2:49 p.m., Alexander Antonov wrote:
>
> On 11/15/2023 8:00 PM, Liang, Kan wrote:
>>
>> On 2023-11-15 10:13 a.m., alexander.antonov@xxxxxxxxxxxxxxx wrote:
>>> From: Alexander Antonov <alexander.antonov@xxxxxxxxxxxxxxx>
>>>
>>> The NULL dereference happens inside upi_fill_topology() procedure in
>>> case of disabling one of the sockets on the system.
>>>
>>> For example, if you disable the 2nd socket on a 4-socket system then
>>> uncore_max_dies() returns 3 and inside pmu_alloc_topology() memory will
>>> be allocated only for 3 sockets and stored in type->topology.
>>> In discover_upi_topology() memory is accessed by socket id from
>>> CPUNODEID
>>> registers which contain physical ids (from 0 to 3) and on the line:
>>>
>>>      upi = &type->topology[nid][idx];
>>>
>>> out-of-bound access will happen and the 'upi' pointer will be passed to
>>> upi_fill_topology() where it will be dereferenced.
>>>
>>> To avoid this issue update the code to convert physical socket id to
>>> logical socket id in discover_upi_topology() before accessing memory.
>>>
>>> Fixes: f680b6e6062e ("perf/x86/intel/uncore: Enable UPI topology
>>> discovery for Icelake Server")
>>> Reported-by: Kyle Meyer <kyle.meyer@xxxxxxx>
>>> Tested-by: Kyle Meyer <kyle.meyer@xxxxxxx>
>>> Signed-off-by: Alexander Antonov <alexander.antonov@xxxxxxxxxxxxxxx>
>>> ---
>>>   arch/x86/events/intel/uncore_snbep.c | 10 ++++++++--
>>>   1 file changed, 8 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/x86/events/intel/uncore_snbep.c
>>> b/arch/x86/events/intel/uncore_snbep.c
>>> index 8250f0f59c2b..49bc27ab26ad 100644
>>> --- a/arch/x86/events/intel/uncore_snbep.c
>>> +++ b/arch/x86/events/intel/uncore_snbep.c
>>> @@ -5596,7 +5596,7 @@ static int discover_upi_topology(struct
>>> intel_uncore_type *type, int ubox_did, i
>>>       struct pci_dev *ubox = NULL;
>>>       struct pci_dev *dev = NULL;
>>>       u32 nid, gid;
>>> -    int i, idx, ret = -EPERM;
>>> +    int i, idx, lgc_pkg, ret = -EPERM;
>>>       struct intel_uncore_topology *upi;
>>>       unsigned int devfn;
>>>   @@ -5614,8 +5614,13 @@ static int discover_upi_topology(struct
>>> intel_uncore_type *type, int ubox_did, i
>>>           for (i = 0; i < 8; i++) {
>>>               if (nid != GIDNIDMAP(gid, i))
>>>                   continue;
>>> +            lgc_pkg = topology_phys_to_logical_pkg(i);
>>> +            if (lgc_pkg < 0) {
>>> +                ret = -EPERM;
>>> +                goto err;
>>> +            }
>> In the snbep_pci2phy_map_init(), there are similar codes to find the
>> logical die id. Can we factor a common function for both of them?
>>
>> Thanks,
>> Kan
> Hi Kan,
>
> Thank you for your comment.
> Yes, I think we can factor out the common loop where GIDNIDMAP is being
> checked.
> But inside snbep_pci2phy_map_init() we have a bit different procedure which
> also does the following:
>
> if (topology_max_die_per_package() > 1)
>     die_id = i;
>
> I think that having this code, at least, in our case could bring us to the
> same issue which we are trying to fix. But of course we could
> parametrize this checking.

The topology_max_die_per_package() > 1 means there are more that 1 die
in a socket. AFAIK, it only happens on the Cascade Lake AP.

Did you observe it in the ICX?

Thanks,
Kan

>
> What do you think?
>
> Thanks,
> Alexander
>>
>>>               for (idx = 0; idx < type->num_boxes; idx++) {
>>> -                upi = &type->topology[nid][idx];
>>> +                upi = &type->topology[lgc_pkg][idx];
>>>                   devfn = PCI_DEVFN(dev_link0 + idx,
>>> ICX_UPI_REGS_ADDR_FUNCTION);
>>>                   dev =
>>> pci_get_domain_bus_and_slot(pci_domain_nr(ubox->bus),
>>>                                     ubox->bus->number,
>>> @@ -5626,6 +5631,7 @@ static int discover_upi_topology(struct
>>> intel_uncore_type *type, int ubox_did, i
>>>                           goto err;
>>>                   }
>>>               }
>>> +            break;
>>>           }
>>>       }
>>>   err:
>>>
>>> base-commit: 9bacdd8996c77c42ca004440be610692275ff9d0