Re: [PATCH v4 12/24] x86/resctrl: Make resctrl_arch_rmid_read() retry when it is interrupted

From: James Morse
Date: Mon Jul 17 2023 - 13:07:39 EST


Hi Peter,

On 6/7/23 13:51, Peter Newman wrote:
On Tue, Jun 6, 2023 at 7:03 PM James Morse <james.morse@xxxxxxx> wrote:
On 06/06/2023 09:49, Peter Newman wrote:
It looks like if __rmid_read() is interrupted by an occupancy counter
read between writing QM_EVTSEL and reading QM_CTR, it will not perform
any update to am->prev_msr, and the interrupted read will return the
same counter value as in the interrupting read.

Yup, that's a problem. I was only looking at the mbm state in memory, not the CPU register.
I think the fix is to read back QM_EVTSEL after reading QM_CTR. I'll do this in
__rmid_read() to avoid returning -EINTR. It creates two retry loops which is annoying, but
making the window larger means you're more likely to see false positives.

----------------------------%<----------------------------
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl
/monitor.c
index e24390d2e661..aeba035bb680 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -101,6 +101,7 @@ static inline u64 get_corrected_mbm_count(u32 rmid, unsigned
long val)

static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
{
+ u32 _rmid, _eventid;
u64 msr_val;

/*
@@ -110,9 +111,15 @@ static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
* IA32_QM_CTR.data (bits 61:0) reports the monitored data.
* IA32_QM_CTR.Error (bit 63) and IA32_QM_CTR.Unavailable (bit 62)
* are error bits.
+ * QM_EVTSEL is re-read to detect if this function was interrupted by
+ * another call, meaning the QM_CTR value may belong to a different
+ * event.
*/
- wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid);
- rdmsrl(MSR_IA32_QM_CTR, msr_val);
+ do {
+ wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid);
+ rdmsrl(MSR_IA32_QM_CTR, msr_val);
+ rdmsr(MSR_IA32_QM_EVTSEL, _eventid, _rmid);
+ } while (eventid != _eventid || rmid != _rmid);

if (msr_val & RMID_VAL_ERROR)
return -EIO;

I happen to be tracking the cost of resctrl_arch_rmid_read() calls, so
I measured the impact of your fix on my AMD EPYC 7B12:

with both this and the soft RMID series[1] applied:

The Soft RMID switches contain two __rmid_read() calls, so this
implies each QM_EVTSEL read-back is around 420 cycles on this AMD
implementation.

Oooer. I assumed writes might have tedious side-effects but reads would cheap.
I suppose its because another CPU may have modified this value in the meantime.


Even if you don't agree with my plan to add resctrl_arch_rmid_read()
calls to context switches, there should be cheaper ways to handle
this.

Yup, I've swapped this for a sequence counter[0], which should push that cost into the noise.
Anything left will be the cost of the atomics.


Thanks,

James

[0] barely tested:
------------------%<------------------
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 238831d53479..86d3a1b99be6 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -16,6 +16,7 @@
*/
#include <linux/module.h>
+#include <linux/percpu.h>
#include <linux/sizes.h>
#include <linux/slab.h>
@@ -24,6 +25,9 @@
#include "internal.h"
+/* Sequence number for writes to IA32 QM_EVTSEL */
+static DEFINE_PER_CPU(u64, qm_evtsel_seq);
+
struct rmid_entry {
/*
* Some architectures's resctrl_arch_rmid_read() needs the CLOSID value
@@ -178,8 +182,7 @@ static inline struct rmid_entry *__rmid_entry(u32 idx)
static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
{
- u32 _rmid, _eventid;
- u64 msr_val;
+ u64 msr_val, seq;
/*
* As per the SDM, when IA32_QM_EVTSEL.EvtID (bits 7:0) is configured
@@ -188,15 +191,16 @@ static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
* IA32_QM_CTR.data (bits 61:0) reports the monitored data.
* IA32_QM_CTR.Error (bit 63) and IA32_QM_CTR.Unavailable (bit 62)
* are error bits.
- * QM_EVTSEL is re-read to detect if this function was interrupted by
- * another call, meaning the QM_CTR value may belong to a different
- * event.
+ * A per-cpu sequence counter is incremented each time QM_EVTSEL is
+ * written. This is used to detect if this function was interrupted by
+ * another call without re-reading the MSRs. Retry the MSR read when
+ * this happens as the QM_CTR value may belong to a different event.
*/
do {
+ seq = this_cpu_inc_return(qm_evtsel_seq);
wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid);
rdmsrl(MSR_IA32_QM_CTR, msr_val);
- rdmsr(MSR_IA32_QM_EVTSEL, _eventid, _rmid);
- } while (eventid != _eventid || rmid != _rmid);
+ } while (seq != this_cpu_read(qm_evtsel_seq));
if (msr_val & RMID_VAL_ERROR)
return -EIO;

------------------%<------------------