Re: [PATCH v3 09/19] x86/resctrl: Queue mon_event_read() instead of sending an IPI

From: James Morse
Date: Thu Apr 27 2023 - 10:11:47 EST


Hi Peter,

On 22/03/2023 14:07, Peter Newman wrote:
> On Mon, Mar 20, 2023 at 6:27 PM James Morse <james.morse@xxxxxxx> wrote:
>>
>> x86 is blessed with an abundance of monitors, one per RMID, that can be
>
> As I explained earlier, this is not the case on AMD.

I'll change it so say Intel.


>> read from any CPU in the domain. MPAMs monitors reside in the MMIO MSC,
>> the number implemented is up to the manufacturer. This means when there are
>> fewer monitors than needed, they need to be allocated and freed.
>>
>> Worse, the domain may be broken up into slices, and the MMIO accesses
>> for each slice may need performing from different CPUs.
>>
>> These two details mean MPAMs monitor code needs to be able to sleep, and
>> IPI another CPU in the domain to read from a resource that has been sliced.
>
> This doesn't sound very convincing. Could mon_event_read() IPI all the
> CPUs in the domain? (after waiting to allocate and install monitors
> when necessary?)

On the majority of platforms this would be a waste of time as the IPI only needs sending
to one. I'd like to keep the cost of being strange limited to the strange platforms.

I don't think exposing a 'sub domain' cpumask to resctrl is helpful: this needs to be
hidden in the architecture specific code.

The IPI is because of SoC components being implemented as slices which are private to that
slice.


The sleeping is because the CSU counters are allowed to be 'not ready' immediately after
programming. The time is short, and to allow platforms that have too few CSU monitors to
support the same user-interface as x86^W Intel, the MPAM driver needs to be able to
multiplex a single CSU monitor between multiple control/monitor groups. Allowing it to
sleep for the advertised not-ready period is the simplest way of doing this.


>> mon_event_read() already invokes mon_event_count() via IPI, which means
>> this isn't possible. On systems using nohz-full, some CPUs need to be
>> interrupted to run kernel work as they otherwise stay in user-space
>> running realtime workloads. Interrupting these CPUs should be avoided,
>> and scheduling work on them may never complete.
>>
>> Change mon_event_read() to pick a housekeeping CPU, (one that is not using
>> nohz_full) and schedule mon_event_count() and wait. If all the CPUs
>> in a domain are using nohz-full, then an IPI is used as the fallback.
>>
>> This function is only used in response to a user-space filesystem request
>> (not the timing sensitive overflow code).
>>
>> This allows MPAM to hide the slice behaviour from resctrl, and to keep
>> the monitor-allocation in monitor.c.
>
> This goal sounds more likely.
>
> If it makes the initial enablement smoother, then I'm all for it.

> Reviewed-By: Peter Newman <peternewman@xxxxxxxxxx>
>
> These changes worked fine for me on tip/master, though there were merge
> conflicts to resolve.
>
> Tested-By: Peter Newman <peternewman@xxxxxxxxxx>

Thanks!


James