Re: [PATCH v2] x86/resctrl: Clear the stale staged config after the configuration is completed

From: Reinette Chatre
Date: Thu Oct 20 2022 - 12:36:19 EST


Hi Shawn,

On 10/19/2022 10:55 PM, Shawn Wang wrote:
> Hi Reinette,
>
> Sorry for replying now due to other things.

No problem.

>
> On 10/12/2022 7:48 AM, Reinette Chatre wrote:
>> Hi Shawn,
>>
>> Thank you very much for working on getting this fixed.
>>
>> On 10/9/2022 1:36 AM, Shawn Wang wrote:

...

>>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>> index 1dafbdc5ac31..2c719da5544f 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>> @@ -338,6 +338,8 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
>>>                   msr_param.high = max(msr_param.high, idx + 1);
>>>               }
>>>           }
>>> +        /* Clear the stale staged config */
>>> +        memset(d->staged_config, 0, sizeof(d->staged_config));
>>>       }
>>>         if (cpumask_empty(cpu_mask))
>>
>> Please also ensure that the temporary storage is cleared if there is an
>> early exist because of failure. Please do not duplicate the memset() code
>> but instead move it to a common exit location.
>>
>
> There are two different resctrl_arch_update_domains() function call paths:
>
> 1.rdtgroup_mkdir()->rdtgroup_mkdir_ctrl_mon()->rdtgroup_init_alloc()->resctrl_arch_update_domains()
> 2.rdtgroup_schemata_write()->resctrl_arch_update_domains()
>
> Perhaps there is no common exit location if we want to clear staged_config[] after every call of resctrl_arch_update_domains().

I was referring to a common exit out of resctrl_arch_update_domains().

Look at how resctrl_arch_update_domains() behaves with this change:

resctrl_arch_update_domains()
{
...

if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
return -ENOMEM;

...
list_for_each_entry(d, &r->domains, list) {
...
memset(d->staged_config, 0, sizeof(d->staged_config));
}


...
done:
free_cpumask_var(cpu_mask);

return 0;
}


The goal of this fix is to ensure that staged_config[] is cleared on
return from resctrl_arch_update_domains() so that there is no stale
data in staged_config[] when resctrl_arch_update_domains() is called
again.

Considering this, I can see two scenarios in the above solution where
staged_config[] is not cleared on exit from resctrl_arch_update_domains():
1) If the zalloc_cpumask_var() fails then it returns -ENOMEM right away
without clearing staged_config[].
2) If there are no domains associated with the resource (although, this
seems not possible because that would imply no CPUs are online ...
but let's make this code robust against strange scenarios).

What I referred to with a "common exit location" was something like this:

resctrl_arch_update_domains()
{
...
int ret = -EINVAL;
...

if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL)) {
ret = -ENOMEM;
goto done;
}

...
list_for_each_entry(d, &r->domains, list) {
...
}


...
ret = 0;
done:
free_cpumask_var(cpu_mask);
memset(d->staged_config, 0, sizeof(d->staged_config));
return ret;
}

Something like the above will ensure that staged_config[] is
always cleared on exit from resctrl_arch_update_domains().

Reinette