Re: [PATCH] LoongArch: Make -mstrict-align be configurable

From: Jianmin Lv
Date: Mon Feb 06 2023 - 20:13:45 EST




On 2023/2/6 下午9:22, Arnd Bergmann wrote:
On Mon, Feb 6, 2023, at 14:13, Jianmin Lv wrote:
On 2023/2/6 下午7:18, Xi Ruoyao wrote:
On Mon, 2023-02-06 at 18:24 +0800, Jianmin Lv wrote:
Hi, Xuerui

I think the kernels produced with and without -mstrict-align have mainly
following differences:
- Diffirent size. I build two kernls (vmlinux), size of kernel with
-mstrict-align is 26533376 bytes and size of kernel without
-mstrict-align is 26123280 bytes.
- Diffirent performance. For example, in kernel function jhash(), the
assemble code slices with and without -mstrict-align are following:

But there are still questions remaining:

(1) Is the difference contributed by a bad code generation of GCC? If
true, it's better to improve GCC before someone starts to build a distro
for LA264 as it would benefit the user space as well.

AFAIK, GCC builds to produce unaligned-access-enabled target binary by
default (without -mstrict-align) for improving user space performance
(small size and runtime high performance), which is also based the fact
that the vast majority of LoongArch CPUs support unaligned-access.

(2) Is there some "big bad unaligned access loop" on a hot spot in the
kernel code? If true, it may be better to just refactor the C code
because doing so will benefit all ports, not only LoongArch. Otherwise,
it may be unworthy to optimize for some cold paths.

Frankly, I'm not sure if there is this kind of hot code in kernel, I
just see the difference from different kernel size and different
assemble code slice. And I'm afraid that it may be difficult to judge
whether it is reasonable hot code or not if exists.

Just look for CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS, this will
show you code locations that use different implementations based on
whether the kernel should run on CPUs without unaligned access or
not.

Arnd


Got it, thank you very much, I greped CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS and found many matched cases including driver, lib, net and so on, it seems that it's reasonable to use high performance way for CPUs with HAVE_EFFICIENT_UNALIGNED_ACCESS configured.