Re: [PATCH] x86/resctrl: Clear the staged configs when destroying schemata list

From: James Morse
Date: Wed Sep 28 2022 - 10:09:27 EST


Hi Shawn,

On 28/09/2022 14:37, Shawn Wang wrote:
> On 9/28/22 5:21 AM, Reinette Chatre wrote:
>> On 9/27/2022 6:06 AM, James Morse wrote:
>>> On 27/09/2022 03:54, Shawn Wang wrote:
>>>> Array staged_config in struct rdt_domain still maintains the original value when
>>>> resctrl is unmounted. If resctrl is mounted with cdp option and then remounted
>>>> without cdp option, field have_new_ctrl in staged_config[CDP_CODE] and
>>>> staged_config[CDP_DATA] will still be true.
>>>
>>> staged_config[CDP_DATA] is an array - its always 'true'. I think you mean
>>> staged_config[CDP_DATA].have_new_ctrl, which will still be true because it is only
>>> memset() when the schemata file is written to.
>>>
>>>
>>>> Since resctrl_arch_update_domains()
>>>> traverses all resctrl_conf_type, it will continue to update CDP_CODE and
>>>> CDP_DATA configurations, which can cause overflow problem.
>>>
>>> Only if its called with a stale staged config, and it should only be called when the
>>> schemata file is written to, which would memset() the staged config first.
>>>
>>>
>>>> The problem can be reproduced by the following commands:
>>>>      # A system with 16 usable closids and mba disabled
>>>>      mount -t resctrl resctrl -o cdp /sys/fs/resctrl
>>>>      mkdir /sys/fs/resctrl/p{1..7}
>>>>      umount /sys/fs/resctrl/
>>>>      mount -t resctrl resctrl /sys/fs/resctrl
>>>>      mkdir /sys/fs/resctrl/p{1..8}
>>>
>>> Thanks for the reproducer - but I don't see what could set have_new_ctrl in this sequence.
>>> You can't call apply_config() to set CPUs in the mask without that being set.
>>>
>>> Creating a new control group, (your mkdir step) shouldn't touch the hardware at all, as it
>>> should be left in its reset state from the last umount(), or setup.
>>
>> There is an attempt to configure the hardware in the mkdir path:
>> rdtgroup_mkdir()->rdtgroup_mkdir_ctrl_mon()->rdtgroup_init_alloc()->resctrl_arch_update_domains()
>>
>>
>> This is required in support of the different resource group modes. When a new
>> resource group is created via mkdir the configuration should respect any
>> exclusive resource groups that exist at that point.
>>
>>>
>>> I can't reproduce this on v6.0-rc7.
>>> Even if I dirty the configuration by writing to the schemata file, I can't reproduce this.
>>>
>>
>>  From what I can tell the reproducer is dependent on (a) whether hardware
>> supports CDP, and (b) the number of CLOSIDs supported by the system. The reproducer
>> works for hardware that has 16 CLOSIDs (see later).
>>
>>> (I have mba enabled, but all this should affect is the number of closid available)

> I reproduce this on v6.0-rc6. The key to reproduction is to ensure that the number of
> usable groups is different between CDP enabled and CDP disabled.
>
> The system I use has 16 CLOSIDs for L3 CAT and 8 CLOSIDs for MBA. MBA limits the max
> number of groups to 8, even if CDP is disabled. This is the reason why I disable MBA.

bingo - could that detail be included in the commit message?

[..]

>> What do you think about clearing the staged config within resctrl_arch_update_domains()
>> after the configuration is complete and there is no more need for it? That may reduce
>> complexity where each caller no longer need to remember to do so.
>> I see "staged_config" as a temporary storage and it my help to understand the code better
>> if it is treated as such.
>>
>
> I think this is better. I have tested it and will give a new version.

Great, thanks!

Could you mention have_new_ctrl in the commit message, and this path via
rdtgroup_init_alloc().

I think you also need:
Fixes: 75408e43509ed ("x86/resctrl: Allow different CODE/DATA configurations to be staged")
Cc: <stable@xxxxxxxxxxxxxxx>


Thanks,

James