Re: [PATCH v7 13/24] x86/resctrl: Queue mon_event_read() instead of sending an IPI

From: James Morse
Date: Thu Dec 14 2023 - 06:37:44 EST


Hi Babu,

On 09/11/2023 20:40, Moger, Babu wrote:
> On 10/25/23 13:03, James Morse wrote:
>> Intel is blessed with an abundance of monitors, one per RMID, that can be
>> read from any CPU in the domain. MPAMs monitors reside in the MMIO MSC,
>> the number implemented is up to the manufacturer. This means when there are
>> fewer monitors than needed, they need to be allocated and freed.
>>
>> MPAM's CSU monitors are used to back the 'llc_occupancy' monitor file. The
>> CSU counter is allowed to return 'not ready' for a small number of
>> micro-seconds after programming. To allow one CSU hardware monitor to be
>> used for multiple control or monitor groups, the CPU accessing the
>> monitor needs to be able to block when configuring and reading the
>> counter.
>>
>> Worse, the domain may be broken up into slices, and the MMIO accesses
>> for each slice may need performing from different CPUs.
>>
>> These two details mean MPAMs monitor code needs to be able to sleep, and
>> IPI another CPU in the domain to read from a resource that has been sliced.
>>
>> mon_event_read() already invokes mon_event_count() via IPI, which means
>> this isn't possible. On systems using nohz-full, some CPUs need to be
>> interrupted to run kernel work as they otherwise stay in user-space
>> running realtime workloads. Interrupting these CPUs should be avoided,
>> and scheduling work on them may never complete.
>>
>> Change mon_event_read() to pick a housekeeping CPU, (one that is not using
>> nohz_full) and schedule mon_event_count() and wait. If all the CPUs
>> in a domain are using nohz-full, then an IPI is used as the fallback.
>>
>> This function is only used in response to a user-space filesystem request
>> (not the timing sensitive overflow code).
>>
>> This allows MPAM to hide the slice behaviour from resctrl, and to keep
>> the monitor-allocation in monitor.c. When the IPI fallback is used on
>> machines where MPAM needs to make an access on multiple CPUs, the counter
>> read will always fail.

>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> index beccb0e87ba7..d07f99245851 100644
>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c

>> @@ -522,12 +524,21 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
>> return ret;
>> }
>>
>> +static int smp_mon_event_count(void *arg)
>> +{
>> + mon_event_count(arg);
>> +
>> + return 0;
>> +}
>
> Shouldn't this function defined as "void" similar to mon_event_count?
> Return code is not used anywhere.

smp_call_on_cpu() requires it to return an int, even if the value is not used.

This wrapper only exists because smp_call_on_cpu() takes a different prototype to
smp_call_function_any().


Thanks,

James

>> @@ -536,7 +547,18 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>> rr->val = 0;
>> rr->first = first;
>>
>> - smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1);
>> + cpu = cpumask_any_housekeeping(&d->cpu_mask);
>> +
>> + /*
>> + * cpumask_any_housekeeping() prefers housekeeping CPUs, but
>> + * are all the CPUs nohz_full? If yes, pick a CPU to IPI.
>> + * MPAM's resctrl_arch_rmid_read() is unable to read the
>> + * counters on some platforms if its called in irq context.
>> + */
>> + if (tick_nohz_full_cpu(cpu))
>> + smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1);
>> + else
>> + smp_call_on_cpu(cpu, smp_mon_event_count, rr, false);
>> }
>>