Re: [PATCH] x86/perf/zhaoxin: Add stepping check for ZX-C

From: silviazhaooc
Date: Thu Feb 02 2023 - 22:12:57 EST


Hi Kevin,

Thanks for your kindly reply.

Since VIA Nano 1000/2000/3000 series are really very old CPU, and we can't find related mainboard for full verification. We suggest not to support all Nano series for PMC driver.

On 2023/2/3 07:53, Kevin Brace wrote:
Hi Silvia,

Thank you very much for resending the VIA / Zhaoxin PMU patch and for keeping me in the loop.
I observed this bug on ECS (Elitegroup Computer Systems) VX900-I mainboard.
The mainboard contains one VIA Nano L2007 processor (1.6 GHz) soldered on the PCB.
Although I have not independently verified it, CPUID steppings for VIA Nano 1000 / 2000 series (Centaur Technology code name: CNA) are supposedly 2 and 3.
CPUID stepppings for VIA Nano 3000 series (Centaur Technology code name: CNB) is 8.
CPUID stepppings for VIA Nano x2 (Centaur Technology code name: CNC) is 10.

https://www.reddit.com/r/VIA/comments/dy71bn/via_centaurs_new_cpu_is_a_8core_x86_cpu_with_an/

I have not checked the actual CPUID steppings, but I have confirmed that the current code without the fix works okay with Nano 3000 series and Nano x2, but definitely not Nano 2000 series.
For Nano 3000 series test, I used VIA Embedded EPIA M830 mainboard.
For Nano x2 test, I used HP T510 thin client.
Based on my observations, it appears that Centaur CNA contains a bug reading some performance counters, so not to cause inconveniences with users of Nano 1000 / 2000 series processors, the patch should limit / prevent reading performance counters on these processors.
I think the code for the fix should reflect this the following way.

_______________________________________________________________
switch (boot_cpu_data.x86) {
case 0x06:
- if (boot_cpu_data.x86_model == 0x0f || boot_cpu_data.x86_model == 0x19) {
+ if ((boot_cpu_data.x86_model == 0x0f && boot_cpu_data.x86_stepping >= 0x08) ||
+ boot_cpu_data.x86_model == 0x19) {

x86_pmu.max_period = x86_pmu.cntval_mask >> 1;

_______________________________________________________________

The above code should exclude Nano 1000 / 2000 series processors properly.
I lost easy access to ECS VX900-I mainboard for now (I need to look around for it. I do own another copy of it.), so I cannot confirm if the fix is properly working.
I still have easy access to EPIA M830 mainboard.
I welcome anyone's feedback on this issue.
This fix should go into the current kernel in development since it is a show stopper for users of Nano 1000 / 2000 series processors.
If the fix is adopted, please backport it to previous releases of the kernel.
I wasted about 2 weeks on this issue, and this fix should have never been ignored for such a long time.

Regards,

Kevin Brace
Brace Computer Laboratory blog
https://bracecomputerlab.com


Sent: Thursday, February 02, 2023 at 3:17 AM
From: "silviazhao-oc" <silviazhao-oc@xxxxxxxxxxx>
To: peterz@xxxxxxxxxxxxx, mingo@xxxxxxxxxx, acme@xxxxxxxxxx, mark.rutland@xxxxxxx, alexander.shishkin@xxxxxxxxxxxxxxx, jolsa@xxxxxxxxxx, namhyung@xxxxxxxxxx, tglx@xxxxxxxxxxxxx, bp@xxxxxxxxx, dave.hansen@xxxxxxxxxxxxxxx, x86@xxxxxxxxxx, hpa@xxxxxxxxx, linux-perf-users@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx
Cc: cobechen@xxxxxxxxxxx, louisqi@xxxxxxxxxxx, silviazhao@xxxxxxxxxxx, tonywwang@xxxxxxxxxxx, kevinbrace@xxxxxxx, 8vvbbqzo567a@xxxxxxxxxxxxxxxxx
Subject: [PATCH] x86/perf/zhaoxin: Add stepping check for ZX-C

Nano processor may not fully support rdpmc instruction, it works well
for reading general pmc counter, but will lead GP(general protection)
when accessing fixed pmc counter. Furthermore, family/mode information
is same between Nano processor and ZX-C processor, it leads to zhaoxin
pmu driver is wrongly loaded for Nano processor, which resulting boot
kernal fail.

To solve this problem, stepping information will be checked to distinguish
between Nano processor and ZX-C processor.

Fixes: 3a4ac121c2ca (“x86/perf: Add hardware performance events support for Zhaoxin CPU”)
Reported-by: Arjan <8vvbbqzo567a@xxxxxxxxxxxxxxxxx>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=212389
Reported-by: Kevin Brace <kevinbrace@xxxxxxx>

Signed-off-by: silviazhao-oc <silviazhao-oc@xxxxxxxxxxx>
---
arch/x86/events/zhaoxin/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/zhaoxin/core.c b/arch/x86/events/zhaoxin/core.c
index 949d845c922b..cef1de251613 100644
--- a/arch/x86/events/zhaoxin/core.c
+++ b/arch/x86/events/zhaoxin/core.c
@@ -541,7 +541,8 @@ __init int zhaoxin_pmu_init(void)
switch (boot_cpu_data.x86) {
case 0x06:
- if (boot_cpu_data.x86_model == 0x0f || boot_cpu_data.x86_model == 0x19) {
+ if ((boot_cpu_data.x86_model == 0x0f && boot_cpu_data.x86_stepping >= 0x0e) ||
+ boot_cpu_data.x86_model == 0x19) {
x86_pmu.max_period = x86_pmu.cntval_mask >> 1;
--
2.17.1