Re: [PATCH v2] x86,acpi: Limit "Dummy wait" workaround to older AMD and Intel processors

From: K Prateek Nayak
Date: Mon Sep 26 2022 - 13:19:57 EST


Hello Peter,

On 9/26/2022 5:37 PM, Peter Zijlstra wrote:
> On Fri, Sep 23, 2022 at 09:08:01PM +0530, K Prateek Nayak wrote:
>> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
>> index ef4775c6db01..fcd3617ed315 100644
>> --- a/arch/x86/include/asm/cpufeatures.h
>> +++ b/arch/x86/include/asm/cpufeatures.h
>> @@ -460,5 +460,6 @@
>> #define X86_BUG_MMIO_UNKNOWN X86_BUG(26) /* CPU is too old and its MMIO Stale Data status is unknown */
>> #define X86_BUG_RETBLEED X86_BUG(27) /* CPU is affected by RETBleed */
>> #define X86_BUG_EIBRS_PBRSB X86_BUG(28) /* EIBRS is vulnerable to Post Barrier RSB Predictions */
>> +#define X86_BUG_STPCLK X86_BUG(29) /* STPCLK# signal does not get asserted in time during IOPORT based C-state entry */
>>
>> #endif /* _ASM_X86_CPUFEATURES_H */
>> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
>> index 48276c0e479d..8cb5887a53a3 100644
>> --- a/arch/x86/kernel/cpu/amd.c
>> +++ b/arch/x86/kernel/cpu/amd.c
>> @@ -988,6 +988,18 @@ static void init_amd(struct cpuinfo_x86 *c)
>> if (!cpu_has(c, X86_FEATURE_XENPV))
>> set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
>>
>> + /*
>> + * CPUs based on the Zen microarchitecture (Fam 17h onward) can
>> + * guarantee that STPCLK# signal is asserted in time after the
>> + * P_LVL2 read to freeze execution after an IOPORT based C-state
>> + * entry. Among the older AMD processors, there has been at least
>> + * one report of an AMD Athlon processor on a VIA chipset
>> + * (circa 2006) having this issue. Mark all these older AMD
>> + * processor families as being affected.
>> + */
>> + if (c->x86 < 0x17)
>> + set_cpu_bug(c, X86_BUG_STPCLK);
>> +
>> /*
>> * Turn on the Instructions Retired free counter on machines not
>> * susceptible to erratum #1054 "Instructions Retired Performance
>> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
>> index 2d7ea5480ec3..96fe1320c238 100644
>> --- a/arch/x86/kernel/cpu/intel.c
>> +++ b/arch/x86/kernel/cpu/intel.c
>> @@ -696,6 +696,18 @@ static void init_intel(struct cpuinfo_x86 *c)
>> ((c->x86_model == INTEL_FAM6_ATOM_GOLDMONT)))
>> set_cpu_bug(c, X86_BUG_MONITOR);
>>
>> + /*
>> + * Intel chipsets prior to Nehalem used the ACPI processor_idle
>> + * driver for C-state management. Some of these processors that
>> + * used IOPORT based C-states could not guarantee that STPCLK#
>> + * signal gets asserted in time after P_LVL2 read to freeze
>> + * execution properly. Since a clear cut-off point is not known
>> + * as to when this bug was solved, mark all the chipsets as
>> + * being affected. Only the ones that use IOPORT based C-state
>> + * transitions via the acpi_idle driver will be impacted.
>> + */
>> + set_cpu_bug(c, X86_BUG_STPCLK);
>> +
>> #ifdef CONFIG_X86_64
>> if (c->x86 == 15)
>> c->x86_cache_alignment = c->x86_clflush_size * 2;
>
> Quiz time:
>
> #define X86_VENDOR_INTEL 0
> #define X86_VENDOR_CYRIX 1
> #define X86_VENDOR_AMD 2
> #define X86_VENDOR_UMC 3
> #define X86_VENDOR_CENTAUR 5
> #define X86_VENDOR_TRANSMETA 7
> #define X86_VENDOR_NSC 8
> #define X86_VENDOR_HYGON 9
> #define X86_VENDOR_ZHAOXIN 10
> #define X86_VENDOR_VORTEX 11
> #define X86_VENDOR_NUM 12
> #define X86_VENDOR_UNKNOWN 0xff
>
> For how many of the above have you changed behaviour?

The proposed logic does alter the behavior for x86 chipsets that depend
on acpi_idle driver and have IOPORT based C-state. Based on what
Rafael and Dave suggested, I have marked all Intel processors to be
affected by this bug. In light of Andreas' report, I've also marked
all the pre-family 17h AMD processors to be affected by this bug to avoid
causing any regression.

It is hard to tell if any other vendor had this bug in their chipsets.
Dave's patch does not make this consideration either and limits the
dummy operation to only Intel chipsets using acpi_idle driver.
(https://lore.kernel.org/all/78d13a19-2806-c8af-573e-7f2625edfab8@xxxxxxxxx/)
If folks reported a regression, I would have been happy to fix it for
them.

>
> Not to mention that this is the gazillion-th time AMD has failed to
> change HYGON in lock-step. That's Zen too -- deal with it.

Hygon is based on the Zen microarchitecture (equivalent to Fam 17h on
AMD) and they too do not need the the dummy wait op to ensure correct
behavior. Hence, they are not marked with x86_BUG_STPCLK.

In the patch description, I've called out:

"mark all the Intel processors and pre-family 17h
AMD processors with x86_BUG_STPCLK. In the acpi_idle driver, restrict the
dummy wait during IOPORT based C-state transitions to only these
processors."

Both Hygon and AMD Fam 17h+, which are based on Zen microachitecture, are
not affected by x86_BUG_STPCLK and hence skip the dummy wait op.

Did I miss something?
--
Thanks and Regards,
Prateek