Re: [PATCH v2 1/2] x86/resctrl: Fix event counts regression in reused RMIDs

From: Peter Newman
Date: Mon Dec 19 2022 - 05:32:05 EST


Hi Reinette,

On Sat, Dec 17, 2022 at 1:59 AM Reinette Chatre
<reinette.chatre@xxxxxxxxx> wrote:
> On 12/14/2022 8:08 AM, Peter Newman wrote:
> > When creating a new monitoring group, the RMID allocated for it may have
> > been used by a group which was previously removed. In this case, the
> > hardware counters will have non-zero values which should be deducted
> > from what is reported in the new group's counts.
> >
> > resctrl_arch_reset_rmid() initializes the prev_msr value for counters to
> > 0, causing the initial count to be charged to the new group. Resurrect
> > __rmid_read() and use it to initialize prev_msr correctly.
> >
> > Unlike before, __rmid_read() checks for error bits in the MSR read so
> > that callers don't need to.
> >
> > Fixes: 1d81d15db39c ("x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read()")
> > Signed-off-by: Peter Newman <peternewman@xxxxxxxxxx>
>
> This does look like a candidate for stable?

Yes, this bug is serious and reproducible. Every RMID reuse would
have up to one overflow's-worth of measurement error.

Should I elaborate on the impact more in the changelog?

>
> > ---
>
> It is helpful to have a summary here of what changed since previous version.

ok, I'll add this

> Thank you very much for catching and fixing this.
>
> Reviewed-by: Reinette Chatre <reinette.chatre@xxxxxxxxx>

Thanks, Reinette!

-Peter