Re: [PATCH] x86/perf/zhaoxin: Add stepping check for ZX-C

From: Kevin Brace
Date: Thu Feb 02 2023 - 18:55:26 EST


Hi Silvia,

Thank you very much for resending the VIA / Zhaoxin PMU patch and for keeping me in the loop.
I observed this bug on ECS (Elitegroup Computer Systems) VX900-I mainboard.
The mainboard contains one VIA Nano L2007 processor (1.6 GHz) soldered on the PCB.
Although I have not independently verified it, CPUID steppings for VIA Nano 1000 / 2000 series (Centaur Technology code name: CNA) are supposedly 2 and 3.
CPUID stepppings for VIA Nano 3000 series (Centaur Technology code name: CNB) is 8.
CPUID stepppings for VIA Nano x2 (Centaur Technology code name: CNC) is 10.

https://www.reddit.com/r/VIA/comments/dy71bn/via_centaurs_new_cpu_is_a_8core_x86_cpu_with_an/

I have not checked the actual CPUID steppings, but I have confirmed that the current code without the fix works okay with Nano 3000 series and Nano x2, but definitely not Nano 2000 series.
For Nano 3000 series test, I used VIA Embedded EPIA M830 mainboard.
For Nano x2 test, I used HP T510 thin client.
Based on my observations, it appears that Centaur CNA contains a bug reading some performance counters, so not to cause inconveniences with users of Nano 1000 / 2000 series processors, the patch should limit / prevent reading performance counters on these processors.
I think the code for the fix should reflect this the following way.

_______________________________________________________________
switch (boot_cpu_data.x86) {
case 0x06:
- if (boot_cpu_data.x86_model == 0x0f || boot_cpu_data.x86_model == 0x19) {
+ if ((boot_cpu_data.x86_model == 0x0f && boot_cpu_data.x86_stepping >= 0x08) ||
+ boot_cpu_data.x86_model == 0x19) {

x86_pmu.max_period = x86_pmu.cntval_mask >> 1;

_______________________________________________________________

The above code should exclude Nano 1000 / 2000 series processors properly.
I lost easy access to ECS VX900-I mainboard for now (I need to look around for it. I do own another copy of it.), so I cannot confirm if the fix is properly working.
I still have easy access to EPIA M830 mainboard.
I welcome anyone's feedback on this issue.
This fix should go into the current kernel in development since it is a show stopper for users of Nano 1000 / 2000 series processors.
If the fix is adopted, please backport it to previous releases of the kernel.
I wasted about 2 weeks on this issue, and this fix should have never been ignored for such a long time.

Regards,

Kevin Brace
Brace Computer Laboratory blog
https://bracecomputerlab.com


> Sent: Thursday, February 02, 2023 at 3:17 AM
> From: "silviazhao-oc" <silviazhao-oc@xxxxxxxxxxx>
> To: peterz@xxxxxxxxxxxxx, mingo@xxxxxxxxxx, acme@xxxxxxxxxx, mark.rutland@xxxxxxx, alexander.shishkin@xxxxxxxxxxxxxxx, jolsa@xxxxxxxxxx, namhyung@xxxxxxxxxx, tglx@xxxxxxxxxxxxx, bp@xxxxxxxxx, dave.hansen@xxxxxxxxxxxxxxx, x86@xxxxxxxxxx, hpa@xxxxxxxxx, linux-perf-users@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx
> Cc: cobechen@xxxxxxxxxxx, louisqi@xxxxxxxxxxx, silviazhao@xxxxxxxxxxx, tonywwang@xxxxxxxxxxx, kevinbrace@xxxxxxx, 8vvbbqzo567a@xxxxxxxxxxxxxxxxx
> Subject: [PATCH] x86/perf/zhaoxin: Add stepping check for ZX-C
>
> Nano processor may not fully support rdpmc instruction, it works well
> for reading general pmc counter, but will lead GP(general protection)
> when accessing fixed pmc counter. Furthermore, family/mode information
> is same between Nano processor and ZX-C processor, it leads to zhaoxin
> pmu driver is wrongly loaded for Nano processor, which resulting boot
> kernal fail.
>
> To solve this problem, stepping information will be checked to distinguish
> between Nano processor and ZX-C processor.
>
> Fixes: 3a4ac121c2ca (“x86/perf: Add hardware performance events support for Zhaoxin CPU”)
> Reported-by: Arjan <8vvbbqzo567a@xxxxxxxxxxxxxxxxx>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=212389
> Reported-by: Kevin Brace <kevinbrace@xxxxxxx>
>
> Signed-off-by: silviazhao-oc <silviazhao-oc@xxxxxxxxxxx>
> ---
> arch/x86/events/zhaoxin/core.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/events/zhaoxin/core.c b/arch/x86/events/zhaoxin/core.c
> index 949d845c922b..cef1de251613 100644
> --- a/arch/x86/events/zhaoxin/core.c
> +++ b/arch/x86/events/zhaoxin/core.c
> @@ -541,7 +541,8 @@ __init int zhaoxin_pmu_init(void)
>
> switch (boot_cpu_data.x86) {
> case 0x06:
> - if (boot_cpu_data.x86_model == 0x0f || boot_cpu_data.x86_model == 0x19) {
> + if ((boot_cpu_data.x86_model == 0x0f && boot_cpu_data.x86_stepping >= 0x0e) ||
> + boot_cpu_data.x86_model == 0x19) {
>
> x86_pmu.max_period = x86_pmu.cntval_mask >> 1;
>
> --
> 2.17.1
>
>