Re: [PATCH v3 09/19] x86/resctrl: Queue mon_event_read() instead of sending an IPI

From: James Morse
Date: Thu Apr 27 2023 - 10:12:32 EST


Hi Reinette,

On 01/04/2023 00:25, Reinette Chatre wrote:
> On 3/20/2023 10:26 AM, James Morse wrote:
>> x86 is blessed with an abundance of monitors, one per RMID, that can be
>> read from any CPU in the domain. MPAMs monitors reside in the MMIO MSC,
>> the number implemented is up to the manufacturer. This means when there are
>> fewer monitors than needed, they need to be allocated and freed.
>>
>> Worse, the domain may be broken up into slices, and the MMIO accesses
>> for each slice may need performing from different CPUs.
>>
>> These two details mean MPAMs monitor code needs to be able to sleep, and
>> IPI another CPU in the domain to read from a resource that has been sliced.
>>
>> mon_event_read() already invokes mon_event_count() via IPI, which means
>> this isn't possible. On systems using nohz-full, some CPUs need to be
>> interrupted to run kernel work as they otherwise stay in user-space
>> running realtime workloads. Interrupting these CPUs should be avoided,
>> and scheduling work on them may never complete.
>>
>> Change mon_event_read() to pick a housekeeping CPU, (one that is not using
>> nohz_full) and schedule mon_event_count() and wait. If all the CPUs
>> in a domain are using nohz-full, then an IPI is used as the fallback.
>
> It is not clear to me where in this solution an IPI is used as fallback ...
> (see below)

>> @@ -537,7 +543,16 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>> rr->val = 0;
>> rr->first = first;
>>
>> - smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1);
>> + cpu = get_cpu();
>> + if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
>> + mon_event_count(rr);
>> + put_cpu();
>> + } else {
>> + put_cpu();
>> +
>> + cpu = cpumask_any_housekeeping(&d->cpu_mask);
>> + smp_call_on_cpu(cpu, mon_event_count, rr, false);
>> + }
>> }
>>
>
> ... from what I can tell there is no IPI fallback here. As per previous
> patch I understand cpumask_any_housekeeping() could still return
> a nohz_full CPU and calling smp_call_on_cpu() on it would not send
> an IPI but instead queue the work to it. What did I miss?

Huh, looks like its still in my git-stash. Sorry about that. The combined hunk looks like
this:
----------------------%<----------------------
@@ -537,7 +550,26 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
rr->val = 0;
rr->first = first;

- smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1);
+ cpu = get_cpu();
+ if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
+ mon_event_count(rr);
+ put_cpu();
+ } else {
+ put_cpu();
+
+ cpu = cpumask_any_housekeeping(&d->cpu_mask);
+
+ /*
+ * cpumask_any_housekeeping() prefers housekeeping CPUs, but
+ * are all the CPUs nohz_full? If yes, pick a CPU to IPI.
+ * MPAM's resctrl_arch_rmid_read() is unable to read the
+ * counters on some platforms if its called in irq context.
+ */
+ if (tick_nohz_full_cpu(cpu))
+ smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1);
+ else
+ smp_call_on_cpu(cpu, smp_mon_event_count, rr, false);
+ }
}

----------------------%<----------------------

Where smp_mon_event_count() is a static wrapper to make the types work.


Thanks,

James