Re: [PATCH v1 06/20] x86/resctrl: Switch over to the resctrl mbps_val list

From: James Morse
Date: Fri Oct 01 2021 - 12:02:49 EST


Hi Reinette,

On 17/09/2021 19:20, Reinette Chatre wrote:
> On 9/17/2021 9:57 AM, James Morse wrote:
>> On 01/09/2021 22:25, Reinette Chatre wrote:
>>> On 7/29/2021 3:35 PM, James Morse wrote:
>>>> Updates to resctrl's software controller follow the same path as
>>>> other configuration updates, but they don't modify the hardware state.
>>>> rdtgroup_schemata_write() uses parse_line() and the resource's
>>>> ctrlval_parse function to stage the configuration.
>>>> resctrl_arch_update_domains() then updates the mbps_val[] array
>>>> instead, and resctrl_arch_update_domains() skips the rdt_ctrl_update()
>>>> call that would update hardware.
>>>>
>>>> This complicates the interface between resctrl's filesystem parts
>>>> and architecture specific code. It should be possible for mba_sc
>>>> to be completely implemented by the filesystem parts of resctrl. This
>>>> would allow it to work on a second architecture with no additional code.
>>>>
>>>> Change parse_bw() to write the configuration value directly to the
>>>> mba_sc[] array in the domain structure. Change rdtgroup_schemata_write()
>>>> to skip the call to resctrl_arch_update_domains(), meaning all the
>>>> mba_sc specific code in resctrl_arch_update_domains() can be removed.
>>>> On the read-side, show_doms() and update_mba_bw() are changed to read
>>>> the mba_sc[] array from the domain structure. With this,
>>>> resctrl_arch_get_config() no longer needs to consider mba_sc resources.
>>>>
>>>> Change parse_bw() to write these values directly, meaning
>>>> rdtgroup_schemata_write() never needs to call update_domains()
>>>> for mba_sc resources.
>>
>>> The above paragraph seems to contain duplicate information from the paragraph that
>>> precedes it.
>>
>> Looks like two commit messages got combined. I've removed this, and the below paragraphs
>> as its already covered.
>>
>>
>>>> Get show_doms() to test is_mba_sc() and retrieve the value
>>>> directly, instead of using get_config() for the hardware value.
>>>>
>>>> This means the arch code's resctrl_arch_get_config() and
>>>> resctrl_arch_update_domains() no longer need to be aware of
>>>> mba_sc, and we can get rid of the update_mba_bw() code that
>>>> reaches into the hw_dom to get the msr value.
>>
>>>> @@ -406,6 +406,14 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
>>>>          list_for_each_entry(s, &resctrl_schema_all, list) {
>>>>            r = s->res;
>>>> +
>>>> +        /*
>>>> +         * Writes to mba_sc resources update the software controller,
>>>> +         * not the control msr.
>>>> +         */
>>>> +        if (is_mba_sc(r))
>>>> +            continue;
>>>> +
>>>
>>> A few resources can be updated in a single write to the schemata file. It is thus possible
>>> to update the cache allocation resource as well as memory bandwidth allocation in a single
>>> write.
>>
>> i.e. echo "L3:0=7ff;1=7ff\nMB:0=100;1=50" > schemata
>
> I do not think something like the above would show the issue. If you want to test this via
> the shell you need to use ANSI-C quoting. Adjusting what you show to something like:
>
> echo -n $'L3:0=7ff;1=7ff\nMB:0=100;1=50\n'
>
>>> As I understand this change in this scenario all configuration updates will be
>>> skipped, not just the memory bandwidth allocation ones.
>>
>> The loop is per-schema, so its not a problem for L2/L3. This would only be a problem if
>> the is_mba_sc() resource had multiple schema. Only CDP does this, which the MBA controls
>> don't support.


> The loop iterates through the entire buffer provided to the schemata file and the buffer
> could contain multiple schema. This is more typical when interacting with the schemata
> file with a SDK perhaps.

I think we are talking about different loops. The diff didn't include much context.
With more context:

| ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
|   char *buf, size_t nbytes, loff_t off)
| {
[...]
|
| while ((tok = strsep(&buf, "\n")) != NULL) {
| resname = strim(strsep(&tok, ":"));
| if (!tok) {
| rdt_last_cmd_puts("Missing ':'\n");
| ret = -EINVAL;
| goto out;
| }
| if (tok[0] == '\0') {
| rdt_last_cmd_printf("Missing '%s' value\n", resname);
| ret = -EINVAL;
| goto out;
| }
| ret = rdtgroup_parse_resource(resname, tok, rdtgrp);
| if (ret)
| goto out;
| }

This is the loop that iterates over the buffer. A break in here would cause the problem
you describe.

|
| list_for_each_entry(s, &resctrl_schema_all, list) {
| r = s->res;
|
| /*
| * Writes to mba_sc resources update the software controller,
| * not the control msr.
| */
| if (is_mba_sc(r))
| continue;
|
| ret = resctrl_arch_update_domains(r, rdtgrp->closid);
| if (ret)
| goto out;
| }

Whereas this one is per-schema. The continue skips the call to update the hardware for
mba_sc, because this will be done by update_mba_bw() when it is next called.

Updating multiple resources with one schema write would be dealt with by the first loop.
The whole buffer is parsed, (unless there is an error). This patch doesn't affect that.
The second loop is is about updating the hardware to match the freshly parsed config.


Thanks,

James