Re: [PATCH] x86/aperfmperf: Fix arch_scale_freq_tick() on tickless systems

From: ypodemsk
Date: Tue Sep 06 2022 - 08:21:32 EST


Friendly ping?

On Thu, 2022-08-04 at 16:17 +0300, Yair Podemsky wrote:
> In order for the scheduler to be frequency invariant we measure the
> ratio between the maximum cpu frequency and the actual cpu frequency.
> During long tickless periods of time the calculations that keep track
> of that might overflow, in the function scale_freq_tick():
>
> if (check_shl_overflow(acnt, 2*SCHED_CAPACITY_SHIFT, &acnt))
> » goto error;
>
> eventually forcing the kernel to disable the feature with the
> message "Scheduler frequency invariance went wobbly, disabling!".
> Let's avoid that by detecting long tickless periods and bypassing
> the calculation for that tick.
>
> This calculation updates the value of arch_freq_scale, used by the
> capacity-aware scheduler to correct cpu duty cycles:
> task_util_freq_inv(p) = duty_cycle(p) * (curr_frequency(cpu) /
> max_frequency(cpu))
>
> However Consider a long tickless period, It takes should take 60
> minutes
> for a tickless CPU running at 5GHz to trigger the acnt overflow,
> pick 10 minutes as a staleness threshold to be on the safe side,
> In our testing it took over 30 minutes for the overflow to happen,
> but since it's frequency/platform dependent we choose a smaller value
> to be on the safe side.
>
> Fixes: e2b0d619b400 ("x86, sched: check for counters overflow in
> frequency invariant accounting")
> Signed-off-by: Yair Podemsky <ypodemsk@xxxxxxxxxx>
> ---
> arch/x86/kernel/cpu/aperfmperf.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/aperfmperf.c
> b/arch/x86/kernel/cpu/aperfmperf.c
> index 1f60a2b27936..dfe356034a60 100644
> --- a/arch/x86/kernel/cpu/aperfmperf.c
> +++ b/arch/x86/kernel/cpu/aperfmperf.c
> @@ -23,6 +23,13 @@
>
> #include "cpu.h"
>
> +/*
> + * Samples older then 10 minutes should not be proccessed,
> + * This time is long enough to prevent unneeded drops of data
> + * But short enough to prevent overflows
> + */
> +#define MAX_SAMPLE_AGE_NOHZ ((unsigned long)HZ * 600)
> +
> struct aperfmperf {
> seqcount_t seq;
> unsigned long last_update;
> @@ -373,6 +380,7 @@ static inline void scale_freq_tick(u64 acnt, u64
> mcnt) { }
> void arch_scale_freq_tick(void)
> {
> struct aperfmperf *s = this_cpu_ptr(&cpu_samples);
> + unsigned long last = s->last_update;
> u64 acnt, mcnt, aperf, mperf;
>
> if (!cpu_feature_enabled(X86_FEATURE_APERFMPERF))
> @@ -392,7 +400,12 @@ void arch_scale_freq_tick(void)
> s->mcnt = mcnt;
> raw_write_seqcount_end(&s->seq);
>
> - scale_freq_tick(acnt, mcnt);
> + /*
> + * Avoid calling scale_freq_tick() when the last update was too
> long ago,
> + * as it might overflow during calulation.
> + */
> + if ((jiffies - last) <= MAX_SAMPLE_AGE_NOHZ)
> + scale_freq_tick(acnt, mcnt);
> }
>
> /*