Re: [PATCH] x86/resctrl: Implement new MBA_mbps throttling heuristic

From: Reinette Chatre
Date: Tue Jan 16 2024 - 16:01:08 EST


Hi Xiaochen,

On 1/9/2024 2:00 PM, Tony Luck wrote:
> The MBA_mbps feedback loop increases throttling when a group is using
> more bandwidth than the target set by the user in the schemata file, and
> decreases throttling when below target.
>
> To avoid possibly stepping throttling up and down on every poll a
> flag "delta_comp" is set whenever throttling is changed to indicate
> that the actual change in bandwidth should be recorded on the next
> poll in "delta_bw". Throttling is only reduced if the current bandwidth
> plus delta_bw is below the user target.
>
> This algorithm works well if the workload has steady bandwidth needs.
> But it can go badly wrong if the workload moves to a different phase
> just as the throttling level changed. E.g. if the workload becomes
> essentially idle right as throttling level is increased, the value
> calculated for delta_bw will be more or less the old bandwidth level.
> If the workload then resumes, Linux may never reduce throttling because
> current bandwidth plu delta_bw is above the target set by the user.
>
> Implement a simpler heuristic by assuming that in the worst case the
> currently measured bandwidth is being controlled by the current level of
> throttling. Compute how much it may increase if throttling is relaxed to
> the next higher level. If that is still below the user target, then it
> is ok to reduce the amount of throttling.
>
> Fixes: ba0f26d8529c ("x86/intel_rdt/mba_sc: Prepare for feedback loop")
> Reported-by: Xiaochen Shen <xiaochen.shen@xxxxxxxxx>
> Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
> ---
>
> This patch was previously posted in:
>
> https://lore.kernel.org/lkml/ZVU+L92LRBbJXgXn@agluck-desk3/#t
>
> as part of a proposal to allow the mba_MBps mount option to base its
> feedback loop input on total bandwidth instead of local bandwidth.
> Sending it now as a standalone patch because Xiaochen reported that
> real systems have experienced problems when delta_bw is incorrectly
> calculated.
>

Does this new heuristic fix the problems you observe? If so, would you be
willing to provide a "Tested-by" tag?

Thank you.

Reinette