Re: [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read()

From: Reinette Chatre
Date: Wed Oct 20 2021 - 16:29:02 EST


Hi Babu,

On 10/20/2021 12:22 PM, Babu Moger wrote:
On 10/20/21 1:15 PM, Reinette Chatre wrote:
On 10/19/2021 4:20 PM, Babu Moger wrote:
On 10/1/21 11:02 AM, James Morse wrote:
__rmid_read() selects the specified eventid and returns the counter
value from the msr. The error handling is architecture specific, and
handled by the callers, rdtgroup_mondata_show() and __mon_event_count().

Error handling should be handled by architecture specific code, as
a different architecture may have different requirements. MPAM's
counters can report that they are 'not ready', requiring a second
read after a short delay. This should be hidden from resctrl.

Make __rmid_read() the architecture specific function for reading
a counter. Rename it resctrl_arch_rmid_read() and move the error
handling into it.

Signed-off-by: James Morse <james.morse@xxxxxxx>
---
Changes since v1:
  * Return EINVAL from the impossible case in __mon_event_count() instead
    of an x86 hardware specific value.
---
  arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  4 +--
  arch/x86/kernel/cpu/resctrl/internal.h    |  2 +-
  arch/x86/kernel/cpu/resctrl/monitor.c     | 42 +++++++++++++++--------
  include/linux/resctrl.h                   |  1 +
  4 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 25baacd331e0..c8ca7184c6d9 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -579,9 +579,9 @@ int rdtgroup_mondata_show(struct seq_file *m, void
*arg)
        mon_event_read(&rr, r, d, rdtgrp, evtid, false);
  -    if (rr.val & RMID_VAL_ERROR)
+    if (rr.err == -EIO)
          seq_puts(m, "Error\n");
-    else if (rr.val & RMID_VAL_UNAVAIL)
+    else if (rr.err == -EINVAL)
          seq_puts(m, "Unavailable\n");
      else
          seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale);

This patch breaks the earlier fix
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fh%3Dv5.15-rc6%26id%3D064855a69003c24bd6b473b367d364e418c57625&amp;data=04%7C01%7Cbabu.moger%40amd.com%7C85219a5827114935cdaa08d993f59fa0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637703505420472920%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=yP8awDgGGZ%2BWj5ZItdTNJItTVuK828yGnibwq%2BrVaf0%3D&amp;reserved=0


When the user reads the events on the default monitoring group with
multiple subgroups, the events on all subgroups are consolidated
together. In case if the last rmid read was resulted in error then whole
group will be reported as error. The err field needs to be cleared.

Please add this patch to clear the error.

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
b/arch/x86/kernel/cpu/resctrl/monitor.c
index 14bc843043da..0e4addf237ec 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -444,6 +444,8 @@ void mon_event_count(void *info)
         /* Report error if none of rmid_reads are successful */
         if (ret_val)
                 rr->val = ret_val;
+       else
+               rr->err  = 0;
  }

  /*


Good catch, thank you.

Even so, I do not think mon_event_count()'s usage of __mon_event_count()
was taken into account by this patch and needs a bigger rework than the
above fixup. For example, if I understand correctly ret_val is the error
and rr->val no longer expected to contain the error after this patch. So
keeping that assignment to rr->val is not correct.

Yes. You are right. rr->val is not expected to contain the error.
Hopefully, this should help.

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
b/arch/x86/kernel/cpu/resctrl/monitor.c
index 14bc843043da..105d972cc511 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -441,9 +441,9 @@ void mon_event_count(void *info)
}
}

- /* Report error if none of rmid_reads are successful */
- if (ret_val)
- rr->val = ret_val;
+ /* Clear the error if at least one of the rmid reads succeed */
+ if (ret_val == 0)
+ rr->err = 0;
}

/*


Yes, this looks good. If the first __mon_event_count() succeeds but a following one fails then the data still needs to be reported so the error code needs to be fixed up afterwards and cannot be done inside __mon_event_count(). Thank you very much.

Reinette