Re: [PATCH v7 14/24] x86/resctrl: Allow resctrl_arch_rmid_read() to sleep

From: James Morse
Date: Thu Dec 14 2023 - 06:38:06 EST


Hi Babu,

On 09/11/2023 20:42, Moger, Babu wrote:
> On 10/25/23 13:03, James Morse wrote:
>> MPAM's cache occupancy counters can take a little while to settle once
>> the monitor has been configured. The maximum settling time is described
>> to the driver via a firmware table. The value could be large enough
>> that it makes sense to sleep. To avoid exposing this to resctrl, it
>> should be hidden behind MPAM's resctrl_arch_rmid_read().
>>
>> resctrl_arch_rmid_read() may be called via IPI meaning it is unable
>> to sleep. In this case resctrl_arch_rmid_read() should return an error
>> if it needs to sleep. This will only affect MPAM platforms where
>> the cache occupancy counter isn't available immediately, nohz_full is
>> in use, and there are no housekeeping CPUs in the necessary domain.
>>
>> There are three callers of resctrl_arch_rmid_read():
>> __mon_event_count() and __check_limbo() are both called from a
>> non-migrateable context. mon_event_read() invokes __mon_event_count()
>> using smp_call_on_cpu(), which adds work to the target CPUs workqueue.
>> rdtgroup_mutex() is held, meaning this cannot race with the resctrl
>> cpuhp callback. __check_limbo() is invoked via schedule_delayed_work_on()
>> also adds work to a per-cpu workqueue.
>>
>> The remaining call is add_rmid_to_limbo() which is called in response
>> to a user-space syscall that frees an RMID. This opportunistically
>> reads the LLC occupancy counter on the current domain to see if the
>> RMID is over the dirty threshold. This has to disable preemption to
>> avoid reading the wrong domain's value. Disabling pre-emption here
>> prevents resctrl_arch_rmid_read() from sleeping.

> I dont know what did you mean by "This has to disable preemption to
> avoid reading the wrong domain's value."

Pre-emption lets this thread be scheduled out, and potentially scheduled back in on a
different CPU, possibly in a different domain. Any code with the concept of 'this domain'
has to to ensure it can't be migrated. Disabling pre-emption is the most common way of
doing that.

Disabling pre-emption also prevents the thread from sleeping, because it can't be
scheduled out.


> Who is disabling the preemption here? Is that specific to ARM?
> Can you please make that clear? Or Am i missing something?

add_rmid_to_limbo() is calling get_cpu(), which raises the pre-empt counter.
If it only wanted the CPU number it could have just called smp_processor_id() - but that
wouldn't be safe because the thread can be migrated, meaning the cpu number can change.

All this is to ensure that cpumask_test_cpu() and resctrl_arch_rmid_read() run on the same
CPU.


Thanks,

James

>> add_rmid_to_limbo() walks each domain, but only reads the counter
>> on one domain. If the system has more than one domain, the RMID will
>> always be added to the limbo list. If the RMIDs usage was not over the
>> threshold, it will be removed from the list when __check_limbo() runs.
>> Make this the default behaviour. Free RMIDs are always added to the
>> limbo list for each domain.
>>
>> The user visible effect of this is that a clean RMID is not available
>> for re-allocation immediately after 'rmdir()' completes, this behaviour
>> was never portable as it never happened on a machine with multiple
>> domains.
>>
>> Removing this path allows resctrl_arch_rmid_read() to sleep if its called
>> with interrupts unmasked. Document this is the expected behaviour, and
>> add a might_sleep() annotation to catch changes that won't work on arm64.


>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index fa3319021881..409817b0ae2c 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -464,17 +464,7 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
>> idx = resctrl_arch_rmid_idx_encode(entry->closid, entry->rmid);
>>
>> entry->busy = 0;
>> - cpu = get_cpu();
>> list_for_each_entry(d, &r->domains, list) {
>> - if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
>> - err = resctrl_arch_rmid_read(r, d, entry->closid,
>> - entry->rmid,
>> - QOS_L3_OCCUP_EVENT_ID,
>> - &val);
>> - if (err || val <= resctrl_rmid_realloc_threshold)
>> - continue;
>> - }
>> -
>> /*
>> * For the first limbo RMID in the domain,
>> * setup up the limbo worker.
>> @@ -484,15 +474,10 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
>> set_bit(idx, d->rmid_busy_llc);
>> entry->busy++;
>> }
>> - put_cpu();
>>
>> - if (entry->busy) {
>> - rmid_limbo_count++;
>> - if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
>> - closid_num_dirty_rmid[entry->closid]++;
>> - } else {
>> - list_add_tail(&entry->list, &rmid_free_lru);
>> - }
>> + rmid_limbo_count++;
>> + if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
>> + closid_num_dirty_rmid[entry->closid]++;
>> }
>>