Re: [PATCH v4 19/21] x86/resctrl: Rename and change the units of resctrl_cqm_threshold

From: James Morse
Date: Wed Jun 22 2022 - 11:16:19 EST


Hi Fenghua,

On 07/06/2022 23:08, Fenghua Yu wrote:
> On Tue, Apr 12, 2022 at 12:44:17PM +0000, James Morse wrote:
>> resctrl_cqm_threshold is stored in a hardware specific chunk size,
>> but exposed to user-space as bytes.
>>
>> This means the filesystem parts of resctrl need to know how the hardware
>> counts, to convert the user provided byte value to chunks. The interface
>> between the architecture's resctrl code and the filesystem ought to
>> treat everything as bytes.
>>
>> Change the unit of resctrl_cqm_threshold to bytes. resctrl_arch_rmid_read()
>> still returns its value in chunks, so this needs converting to bytes.
>> As all the users have been touched, rename the variable to
>> resctrl_rmid_realloc_threshold, which describes what the value is for.
>>
>> Neither r->num_rmid nor hw_res->mon_scale are guaranteed to be a power
>> of 2, so the existing code introduces a rounding error from resctrl's
>> theoretical fraction of the cache usage. This behaviour is kept as it
>> ensures the user visible value matches the value read from hardware
>> when the rmid will be reallocated.

>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 88988de0c96c..00f6e27e4e0d 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -762,10 +764,15 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
>> *
>> * For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC.
>> */
>> - resctrl_cqm_threshold = cl_size * 1024 / r->num_rmid;
>> + threshold = cl_size * 1024 / r->num_rmid;
>>
>> - /* h/w works in units of "boot_cpu_data.x86_cache_occ_scale" */


> Could you please keep this comment? This comment is still helpful and
> meaningful in the context.

Not in this context anymore:

>> - resctrl_cqm_threshold /= hw_res->mon_scale;

But if you think its important I'll move it to resctrl_arch_round_mon_val(), which got
added after Reinette's comment about the change in behaviour visible via the
max_threshold_occupancy file.


>> + /*
>> + * Because num_rmid may not be a power of two, round the value
>> + * to the nearest multiple of hw_res->mon_scale so it matches a
>> + * value the hardware will measure. mon_scale may not be a power of 2.
>> + */
>> + threshold /= hw_res->mon_scale;
>> + resctrl_rmid_realloc_threshold = threshold * hw_res->mon_scale;
>>
>> ret = dom_data_init(r);
>> if (ret)

>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index f494ca6b8bdd..7c35561e5216 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c

>> @@ -1066,8 +1062,7 @@ static ssize_t max_threshold_occ_write(struct kernfs_open_file *of,
>> if (bytes > (boot_cpu_data.x86_cache_size * 1024))
>> return -EINVAL;
>>
>> - hw_res = resctrl_to_arch_res(of->kn->parent->priv);
>> - resctrl_cqm_threshold = bytes / hw_res->mon_scale;
>> + resctrl_rmid_realloc_threshold = bytes;
>
> Shouldn't bytes be multiples of hw_res->mon_scale? If user inputs non-multiples
> value, resctrl_rmid_realloc_threshold will keep the value in the kernel. Is that
> right?

I'd argue its the value user-space supplied, and its weird if you don't read back the
value you wrote.

But Reinette argued this was a change in behaviour, so v5 has a helper that does this:
| static inline unsigned int resctrl_arch_round_mon_val(unsigned int val)
| {
| unsigned int scale = boot_cpu_data.x86_cache_occ_scale;
|
| /* h/w works in units of "boot_cpu_data.x86_cache_occ_scale" */
| val /= scale;
| return val * scale;
| }


> But if you convert the input into multiples, user may see a different value when
> read it.

Weird huh! But that is what the max_threshold_occupancy file does today.


> Does this argument override the reason why this patch is needed?

No, this is about making more of resctrl handle the values in a platform agnostic unit,
like bytes, so it can be moved to live in /fs/ instead of arch/x86.


Thanks,

James