Re: [PATCH v7 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()

From: Reinette Chatre
Date: Wed Dec 13 2023 - 18:28:10 EST


Hi James,

On 12/13/2023 10:03 AM, James Morse wrote:
> On 09/11/2023 17:39, Reinette Chatre wrote:
>> On 10/25/2023 11:03 AM, James Morse wrote:

...

>>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>>> index 19e0681f0435..0056c9962a44 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>>> @@ -992,7 +992,13 @@ late_initcall(resctrl_late_init);
>>>
>>> static void __exit resctrl_exit(void)
>>> {
>>> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>>> +
>>> cpuhp_remove_state(rdt_online);
>>> +
>>> + if (r->mon_capable)
>>> + rdt_put_mon_l3_config(r);
>>> +
>>> rdtgroup_exit();
>>> }
>>
>> I expect cleanup to do the inverse of init. I do not know what was the
>> motivation for the rdtgroup_exit() to follow cpuhp_remove_state()
>
> This will invoke the hotplug callbacks, making it look to resctrl like all CPUs are
> offline. This means it is then impossible for rdtgroup_exit() to race with the hotplug
> notifiers. (if you could run this code...)
>

hmmm ... if there is a risk of such a race would the init code not also be
vulnerable to that with the notifiers up before rdtgroup_init()? The races you mention
are not obvious to me. I see the filesystem and hotplug code protected against races via
the mutex and static keys. Could you please elaborate on the flows of concern?

I am not advocating for cpuhp_remove_state() to be called later. I understand that
it simplifies the flows to consider.

>> but I
>> was expecting this new cleanup to be done after rdtgroup_exit() to be inverse
>> of init. This cleanup is inserted in middle of two existing cleanup - could
>> you please elaborate how this location was chosen?
>
> rdtgroup_exit() does nothing with the resctrl structures, it removes sysfs and debugfs
> entries, and unregisters the filesystem.
>
> Hypothetically, you can't observe any effect of the rmid_ptrs array being freed as
> all the CPUs are offline and the overflow/limbo threads should have been cancelled.
> Once cpuhp_remove_state() has been called, this really doesn't matter.

Sounds like nothing prevents this code from following the custom of cleanup to be
inverse of init (yet keep cpuhp_remove_state() first).

Reinette