Re: [PATCH] cpufreq: CPPC: Resolve the large frequency discrepancy from cpuinfo_cur_freq

From: lihuisong (C)
Date: Thu Jan 04 2024 - 04:09:35 EST



在 2024/1/3 18:59, Rafael J. Wysocki 写道:
On Mon, Dec 18, 2023 at 3:15 AM lihuisong (C) <lihuisong@xxxxxxxxxx> wrote:

在 2023/12/15 10:41, lihuisong (C) 写道:
Hi Rafael,

Thanks for your review.😁

在 2023/12/15 3:31, Rafael J. Wysocki 写道:
On Tue, Dec 12, 2023 at 8:26 AM Huisong Li <lihuisong@xxxxxxxxxx> wrote:
Many developers found that the cpu current frequency is greater than
the maximum frequency of the platform, please see [1], [2] and [3].

In the scenarios with high memory access pressure, the patch [1] has
proved the significant latency of cpc_read() which is used to obtain
delivered and reference performance counter cause an absurd frequency.
The sampling interval for this counters is very critical and is
expected
to be equal. However, the different latency of cpc_read() has a direct
impact on their sampling interval.

This patch adds a interface, cpc_read_arch_counters_on_cpu, to read
delivered and reference performance counter together. According to my
test[4], the discrepancy of cpu current frequency in the scenarios with
high memory access pressure is lower than 0.2% by stress-ng
application.

[1]
https://lore.kernel.org/all/20231025093847.3740104-4-zengheng4@xxxxxxxxxx/
[2]
https://lore.kernel.org/all/20230328193846.8757-1-yang@xxxxxxxxxxxxxxxxxxxxxx/
[3]
https://lore.kernel.org/all/20230418113459.12860-7-sumitg@xxxxxxxxxx/

[4] My local test:
The testing platform enable SMT and include 128 logical CPU in total,
and CPU base frequency is 2.7GHz. Reading "cpuinfo_cur_freq" for each
physical core on platform during the high memory access pressure from
stress-ng, and the output is as follows:
0: 2699133 2: 2699942 4: 2698189 6: 2704347
8: 2704009 10: 2696277 12: 2702016 14: 2701388
16: 2700358 18: 2696741 20: 2700091 22: 2700122
24: 2701713 26: 2702025 28: 2699816 30: 2700121
32: 2700000 34: 2699788 36: 2698884 38: 2699109
40: 2704494 42: 2698350 44: 2699997 46: 2701023
48: 2703448 50: 2699501 52: 2700000 54: 2699999
56: 2702645 58: 2696923 60: 2697718 62: 2700547
64: 2700313 66: 2700000 68: 2699904 70: 2699259
72: 2699511 74: 2700644 76: 2702201 78: 2700000
80: 2700776 82: 2700364 84: 2702674 86: 2700255
88: 2699886 90: 2700359 92: 2699662 94: 2696188
96: 2705454 98: 2699260 100: 2701097 102: 2699630
104: 2700463 106: 2698408 108: 2697766 110: 2701181
112: 2699166 114: 2701804 116: 2701907 118: 2701973
120: 2699584 122: 2700474 124: 2700768 126: 2701963

Signed-off-by: Huisong Li <lihuisong@xxxxxxxxxx>
First off, please Cc ACPI-related patches to linux-acpi.
got it.

+linux-acpi@xxxxxxxxxxxxxxx

---
arch/arm64/kernel/topology.c | 43
++++++++++++++++++++++++++++++++++--
drivers/acpi/cppc_acpi.c | 22 +++++++++++++++---
include/acpi/cppc_acpi.h | 5 +++++
3 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/topology.c
b/arch/arm64/kernel/topology.c
index 7d37e458e2f5..c3122154d738 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -299,6 +299,11 @@ core_initcall(init_amu_fie);
#ifdef CONFIG_ACPI_CPPC_LIB
#include <acpi/cppc_acpi.h>

+struct amu_counters {
+ u64 corecnt;
+ u64 constcnt;
+};
+
static void cpu_read_corecnt(void *val)
{
/*
@@ -322,8 +327,27 @@ static void cpu_read_constcnt(void *val)
0UL : read_constcnt();
}

+static void cpu_read_amu_counters(void *data)
+{
+ struct amu_counters *cnt = (struct amu_counters *)data;
+
+ /*
+ * The running time of the this_cpu_has_cap() might have a
couple of
+ * microseconds and is significantly increased to tens of
microseconds.
+ * But AMU core and constant counter need to be read togeter
without any
+ * time interval to reduce the calculation discrepancy using
this counters.
+ */
+ if (this_cpu_has_cap(ARM64_WORKAROUND_2457168)) {
+ cnt->corecnt = read_corecnt();
This statement is present in both branches, so can it be moved before
the if ()?
Yes.
Do you mean adding a blank line before if()?
Sorry, I misunderstood you.
The statement "cnt->corecnt = read_corecnt();" cannot be moved before
the if().
The AMU core and constant counter need to be read togeter without any
time interval as described in code comments.
The this_cpu_has_cap() is time-consuming.
That is why I don't use the cpu_read_constcnt() to read constant counter.
So define something like

static inline void amu_read_counters(struct amu_counters *cnt, bool
read_constcnt)
{
cnt->corecnt = read_corecnt();
if (read_constcnt)
cnt->constcnt = read_constcnt();
else
cnt->constcnt = 0;
}

+ cnt->constcnt = 0;
+ } else {
+ cnt->corecnt = read_corecnt();
+ cnt->constcnt = read_constcnt();
+ }
and use it like this:

amu_read_counters(cnt, !this_cpu_has_cap(ARM64_WORKAROUND_2457168);

It should work as expected AFAICS.
Yes it would work well.
The new cpc_read_arch_counters_on_cpu() uses the counters_read_on_cpu() to read core counters on specified cpu.
smp_call_function_single() called by counters_read_on_cpu() just support to pass one argument for "smp_call_func_t func".
So we cannot define the function as you suggested. what do you think?

+}
+
static inline
-int counters_read_on_cpu(int cpu, smp_call_func_t func, u64 *val)
+int counters_read_on_cpu(int cpu, smp_call_func_t func, void *data)
{
/*
* Abort call on counterless CPU or when interrupts are
@@ -335,7 +359,7 @@ int counters_read_on_cpu(int cpu,
smp_call_func_t func, u64 *val)
if (WARN_ON_ONCE(irqs_disabled()))
return -EPERM;

- smp_call_function_single(cpu, func, val, 1);
+ smp_call_function_single(cpu, func, data, 1);

return 0;
}
@@ -364,6 +388,21 @@ bool cpc_ffh_supported(void)
return true;
}

+int cpc_read_arch_counters_on_cpu(int cpu, u64 *delivered, u64
*reference)
+{
+ struct amu_counters cnts = {0};
+ int ret;
+
+ ret = counters_read_on_cpu(cpu, cpu_read_amu_counters, &cnts);
+ if (ret)
+ return ret;
+
+ *delivered = cnts.corecnt;
+ *reference = cnts.constcnt;
+
+ return 0;
+}
+
int cpc_read_ffh(int cpu, struct cpc_reg *reg, u64 *val)
{
int ret = -EOPNOTSUPP;
diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index 7ff269a78c20..f303fabd7cfe 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -1299,6 +1299,11 @@ bool cppc_perf_ctrs_in_pcc(void)
}
EXPORT_SYMBOL_GPL(cppc_perf_ctrs_in_pcc);

+int __weak cpc_read_arch_counters_on_cpu(int cpu, u64 *delivered,
u64 *reference)
+{
+ return 0;
+}
+
/**
* cppc_get_perf_ctrs - Read a CPU's performance feedback counters.
* @cpunum: CPU from which to read counters.
@@ -1313,7 +1318,8 @@ int cppc_get_perf_ctrs(int cpunum, struct
cppc_perf_fb_ctrs *perf_fb_ctrs)
*ref_perf_reg, *ctr_wrap_reg;
int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpunum);
struct cppc_pcc_data *pcc_ss_data = NULL;
- u64 delivered, reference, ref_perf, ctr_wrap_time;
+ u64 delivered = 0, reference = 0;
+ u64 ref_perf, ctr_wrap_time;
int ret = 0, regs_in_pcc = 0;

if (!cpc_desc) {
@@ -1350,8 +1356,18 @@ int cppc_get_perf_ctrs(int cpunum, struct
cppc_perf_fb_ctrs *perf_fb_ctrs)
}
}

- cpc_read(cpunum, delivered_reg, &delivered);
- cpc_read(cpunum, reference_reg, &reference);
+ if (cpc_ffh_supported()) {
+ ret = cpc_read_arch_counters_on_cpu(cpunum,
&delivered, &reference);
+ if (ret) {
+ pr_debug("read arch counters failed,
ret=%d.\n", ret);
+ ret = 0;
+ }
+ }
The above is surely not applicable to every platform using CPPC. Also
cpc_ffh_supported is aimed to control only the platform supported FFH
to enter.
cpc_read_arch_counters_on_cpu is also needed to implemented by each
platform according to their require.
Well, exactly.

Here just implement this interface for arm64.
So on x86 cpc_ffh_supported() returns true and
cpc_read_arch_counters_on_cpu() will do nothing, so it will always
fall back to using cpc_read(). That is not particularly
straightforward IMV.
Understand you.
But we have to do like this to be compatible with all platforms.
I am thinking twice about cpc_ffh_supported().
And I found that calling cpc_ffh_supported() here is redundant for ARM.
Because counters_read_on_cpu() has verified if this CPU support AMU counter.
What do you say we remove cpc_ffh_supported() here and directly call this new architecture interface?

it looks like in the ARM64_WORKAROUND_2457168 enabled case it is just
pointless overhead, because "reference" is always going to be 0 here
then.
Right, it is always going to be 0 here for the
ARM64_WORKAROUND_2457168 enabled case .
But ARM64_WORKAROUND_2457168 is a macro releated to ARM.
It seems that it is not appropriate for this macro to appear this
common place for all platform, right?

Please clean that up.

+ if (!delivered || !reference) {
+ cpc_read(cpunum, delivered_reg, &delivered);
+ cpc_read(cpunum, reference_reg, &reference);
+ }
+
cpc_read(cpunum, ref_perf_reg, &ref_perf);

/*
diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
index 6126c977ece0..07d4fd82d499 100644
--- a/include/acpi/cppc_acpi.h
+++ b/include/acpi/cppc_acpi.h
@@ -152,6 +152,7 @@ extern bool cpc_ffh_supported(void);
extern bool cpc_supported_by_cpu(void);
extern int cpc_read_ffh(int cpunum, struct cpc_reg *reg, u64 *val);
extern int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val);
+extern int cpc_read_arch_counters_on_cpu(int cpu, u64 *delivered,
u64 *reference);
extern int cppc_get_epp_perf(int cpunum, u64 *epp_perf);
extern int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls
*perf_ctrls, bool enable);
extern int cppc_get_auto_sel_caps(int cpunum, struct
cppc_perf_caps *perf_caps);
@@ -209,6 +210,10 @@ static inline int cpc_write_ffh(int cpunum,
struct cpc_reg *reg, u64 val)
{
return -ENOTSUPP;
}
+static inline int cpc_read_arch_counters_on_cpu(int cpu, u64
*delivered, u64 *reference)
+{
+ return -EOPNOTSUPP;
+}
static inline int cppc_set_epp_perf(int cpu, struct
cppc_perf_ctrls *perf_ctrls, bool enable)
{
return -ENOTSUPP;
--
.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
.