Re: [PATCH] arm64: Add the arm64.nolse_atomics command line option

From: Aiqun(Maria) Yu
Date: Tue Jul 11 2023 - 06:13:13 EST


On 7/11/2023 2:57 PM, Marc Zyngier wrote:
On Tue, 11 Jul 2023 04:30:44 +0100,
"Aiqun(Maria) Yu" <quic_aiquny@xxxxxxxxxxx> wrote:

On 7/10/2023 5:31 PM, Marc Zyngier wrote:
On Mon, 10 Jul 2023 09:19:54 +0100,
"Aiqun(Maria) Yu" <quic_aiquny@xxxxxxxxxxx> wrote:

On 7/10/2023 3:27 PM, Marc Zyngier wrote:
On Mon, 10 Jul 2023 06:59:55 +0100,
Maria Yu <quic_aiquny@xxxxxxxxxxx> wrote:

In order to be able to disable lse_atomic even if cpu
support it, most likely because of memory controller
cannot deal with the lse atomic instructions, use a
new idreg override to deal with it.

In general, the idreg overrides are *not* there to paper over HW bugs.
They are there to force the kernel to use or disable a feature for
performance reason or to guide the *enabling* of a feature, but not
because the HW is broken.

The broken status of a HW platform must also be documented so that we
know what to expect when we look at, for example, a bad case of memory
corruption (something I'd expect to see on a system that only
partially implements atomic memory operations).


good idea. A noc error would be happened if the lse atomic instruction
happened during a memory controller doesn't support lse atomic
instructions.
I can put the information in next patchset comment message. Pls feel
free to let know if there is other place to have this kind of
information with.

For a start, Documentation/arch/arm64/silicon-errata.rst should
contain an entry for the actual erratum, and a description of the
symptoms of the issue (you're mentioning a "noc error": how is that
reported to the CPU?).

This is not a cpu's errata as my understanding. It is the DDR
subsystem which don't have the LSE atomic feature supported.

CPU or not doesn't matter. We also track system errata.

Thank you for clarify on this.
If I am correct understanding, are you suggesting system errata with DT seperate compatible (or similar) to runtime disable this feature instead of idreg override with arm64.nolse options?

While it is better to finally affect the host arm64_ftr_regs value since it can also derived to kvm sys reg as well.


The workaround should also be detected at runtime -- we cannot rely on
the user to provide a command-line argument to disable an essential
feature that anyone has taken for granted for most of a decade...

We are also seeking help from DDR Subsystem POC to see whether it is
possible to detect the LSE atomic feature support or not at runtime.

Keying it off a DT compatible (or something similar) would work.

In my opinion, LSE atomic is a system level feature instead of a cpu
only feature. So currently solution we is that even if cpu support lse
atomic, but it still needed to be disabled if the cpu working with a
lse atomic not support by current system's DDR subsystem.

In the absence of a detection mechanism for anything past the CPU,
this is a moot point. At this stage, this is a bit like saying
"writing to memory is a system thing, not only a CPU feature".

And this also breaks KVM if these CPUs don't have FWB, as a guest can
always map a piece of memory as non-cacheable, and trigger the issue
you describe in your reply to Will, even if you hide the atomics on
the host.


For the KVM part, per my understanding, as long as the current feature id being overriden, the KVM system also get the current vcpu without the lse atomic feature enabled.
KVM vcpu will read the sys reg from host arm64_ftr_regs which is already been controled by the idreg_overrides.

check reference from:
https://elixir.bootlin.com/linux/v6.5-rc1/source/arch/arm64/kernel/cpufeature.c#L680

https://elixir.bootlin.com/linux/v6.5-rc1/source/arch/arm64/kvm/sys_regs.c#L1360

M.


--
Thx and BRs,
Aiqun(Maria) Yu