Re: [PATCH v4 12/24] x86/resctrl: Make resctrl_arch_rmid_read() retry when it is interrupted

From: Peter Newman
Date: Wed Jun 07 2023 - 08:52:23 EST


Hi James,

On Tue, Jun 6, 2023 at 7:03 PM James Morse <james.morse@xxxxxxx> wrote:
> On 06/06/2023 09:49, Peter Newman wrote:
> > It looks like if __rmid_read() is interrupted by an occupancy counter
> > read between writing QM_EVTSEL and reading QM_CTR, it will not perform
> > any update to am->prev_msr, and the interrupted read will return the
> > same counter value as in the interrupting read.
>
> Yup, that's a problem. I was only looking at the mbm state in memory, not the CPU register.
> I think the fix is to read back QM_EVTSEL after reading QM_CTR. I'll do this in
> __rmid_read() to avoid returning -EINTR. It creates two retry loops which is annoying, but
> making the window larger means you're more likely to see false positives.
>
> ----------------------------%<----------------------------
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl
> /monitor.c
> index e24390d2e661..aeba035bb680 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -101,6 +101,7 @@ static inline u64 get_corrected_mbm_count(u32 rmid, unsigned
> long val)
>
> static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
> {
> + u32 _rmid, _eventid;
> u64 msr_val;
>
> /*
> @@ -110,9 +111,15 @@ static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
> * IA32_QM_CTR.data (bits 61:0) reports the monitored data.
> * IA32_QM_CTR.Error (bit 63) and IA32_QM_CTR.Unavailable (bit 62)
> * are error bits.
> + * QM_EVTSEL is re-read to detect if this function was interrupted by
> + * another call, meaning the QM_CTR value may belong to a different
> + * event.
> */
> - wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid);
> - rdmsrl(MSR_IA32_QM_CTR, msr_val);
> + do {
> + wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid);
> + rdmsrl(MSR_IA32_QM_CTR, msr_val);
> + rdmsr(MSR_IA32_QM_EVTSEL, _eventid, _rmid);
> + } while (eventid != _eventid || rmid != _rmid);
>
> if (msr_val & RMID_VAL_ERROR)
> return -EIO;

I happen to be tracking the cost of resctrl_arch_rmid_read() calls, so
I measured the impact of your fix on my AMD EPYC 7B12:

with both this and the soft RMID series[1] applied:

Base switch 7955 0.23%
Hard RMID Switch 8476 6.80%
Soft RMID Switch 10173 28.17%
CLOSID+Soft RMID 10740 35.32%

then adding EVTSEL read-back patch:

Base switch 7985
Hard RMID Switch 8540 6.96%
Soft RMID Switch 11015 37.95%
CLOSID+Soft RMID 11590 45.16%

The Soft RMID switches contain two __rmid_read() calls, so this
implies each QM_EVTSEL read-back is around 420 cycles on this AMD
implementation.

Even if you don't agree with my plan to add resctrl_arch_rmid_read()
calls to context switches, there should be cheaper ways to handle
this.

-Peter

[1] https://lore.kernel.org/lkml/20230421141723.2405942-4-peternewman@xxxxxxxxxx/