Re: [PATCH] arm64: Add the arm64.nolse_atomics command line option

From: Aiqun(Maria) Yu
Date: Wed Jul 12 2023 - 22:24:53 EST


On 7/12/2023 3:36 PM, Mark Rutland wrote:
On Wed, Jul 12, 2023 at 11:09:10AM +0800, Aiqun(Maria) Yu wrote:
On 7/11/2023 6:25 PM, Will Deacon wrote:
On Tue, Jul 11, 2023 at 06:15:49PM +0800, Aiqun(Maria) Yu wrote:
On 7/11/2023 4:22 PM, Will Deacon wrote:
On Tue, Jul 11, 2023 at 12:02:22PM +0800, Aiqun(Maria) Yu wrote:
On 7/10/2023 5:37 PM, Will Deacon wrote:
On Mon, Jul 10, 2023 at 01:59:55PM +0800, Maria Yu wrote:
In order to be able to disable lse_atomic even if cpu
support it, most likely because of memory controller
cannot deal with the lse atomic instructions, use a
new idreg override to deal with it.

This should not be a problem for cacheable memory though, right?

Given that Linux does not issue atomic operations to non-cacheable mappings,
I'm struggling to see why there's a problem here.

The lse atomic operation can be issued on non-cacheable mappings as well.
Even if it is cached data, with different CPUECTLR_EL1 setting, it can also
do far lse atomic operations.

Please can you point me to the place in the kernel sources where this
happens? The architecture doesn't guarantee that atomics to non-cacheable
mappings will work, see "B2.2.6 Possible implementation restrictions on
using atomic instructions". Linux, therefore, doesn't issue atomics
to non-cacheable memory.

We encounter the issue on third party kernel modules and third party apps
instead of linux kernel itself.

Great, so there's nothing to do in the kernel then!

The third party code needs to be modified not to use atomic instructions
with non-cacheable mappings. No need to involve us with that.

This is a tradeoff of performance and stability. Per my understanding,
options can be used to enable the lse_atomic to have the most performance
cared system, and disable the lse_atomic by stability cared most system.

Where do livelock and starvation fit in with "stability"? Disabling LSE
atomics for things like qspinlock and the scheduler just because of some
badly written third-party code isn't much of a tradeoff.

We also have requirement to have cpus/system fully support lse atomic and
cpus/system not fully support lse atomic with a generic kernel image.

Who *specifically* has this requirement (i.e. what does 'we' mean here)? The

I can use other word to describe the requirement instead of "we".

There is requirements like android google gki. It request different cpu arch system to use same generic kernel Image.

upstream kernel does not require that atomics work on non-cacheable memory, and

The same issue the system can be down of lse atomic not supported for cachable memory when there need far atomic.

saying "The company I work for want this" doesn't change that.

AFAICT the system here is architecturally compliant, and what you're relying
upon something that the architecture doesn't guarantee, and Linux doesn't
guarantee.

It is not also only our company's problem:
To support the atomic instructions added in the Armv8.1 architecture, CHI-B provides Atomic Transactions. while Atomic Transactions support is also *optional* from CHI-B.

So far atomic cannot fully supported by ARMv8.1 cpu + CHI-B system as well.

from: https://developer.arm.com/documentation/102407/0100/Atomic-operations?lang=en
So only cpu support atomic cannot garantee the system support lse atomic

Same kernel module wanted to be used by lse atomic fully support cpu and not
fully support cpu/system as well.

Which kernel modules *specifically* need to do atomics to non-cacheable memory?
The driver want to always do far atomic(no speculatively) and allow a read-modify-write non-interruptible sequence in a single instruction.

That's why we want to have a runtime option here.

As per other replies, a runtime option doesn't solve the issue you have
described, and it will adversely affect the system in other ways (e.g. the
livelock and starvation issues will mentioned, which we have seen with
LDXR+STXR atomics).
I myself also have encounter issues from livelock because of LDXR+STXR atomics unfairness before. More likely happened when different performance cpu. So myself also glad to using atomics instead of exclusive access.
So if there is a way to fully utilize the atomic instructions for current hardware, and also support the far atomic, that can be much better solution than currently disable the feature.

Thanks,
Mark.

Pls feel free to comments. It would lead to a reasonable and usable solution from our discussions.

--
Thx and BRs,
Aiqun(Maria) Yu