Re: [PATCH -next] arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION

From: Arnd Bergmann
Date: Tue Feb 20 2024 - 03:40:31 EST


On Tue, Feb 20, 2024, at 09:15, Yuntao Liu wrote:
> The current arm32 architecture does not yet support the
> HAVE_LD_DEAD_CODE_DATA_ELIMINATION feature. arm32 is widely used in
> embedded scenarios, and enabling this feature would be beneficial for
> reducing the size of the kernel image.
>
> In order to make this work, we keep the necessary tables by annotating
> them with KEEP, also it requires further changes to linker script to KEEP
> some tables and wildcard compiler generated sections into the right place.

Thanks for the patch, I think this is a very useful feature
and we should get this upstream, especially if we can combine
it with CONFIG_LTO_CLANG (which is supported on arm64 at the
moment, but not on arm).

> It boots normally with defconfig, vexpress_defconfig and tinyconfig.
>
> The size comparison of zImage is as follows:
> defconfig vexpress_defconfig tinyconfig
> 5137712 5138024 424192 no dce
> 5032560 4997824 298384 dce
> 2.0% 2.7% 29.7% shrink
>
> When using smaller config file, there is a significant reduction in the
> size of the zImage.
>
> We also tested this patch on a commercially available single-board
> computer, and the comparison is as follows:
> a15eb_config
> 2161384 no dce
> 2092240 dce
> 3.2% shrink
>
> The zImage size has been reduced by approximately 3.2%, which is 70KB on
> 2.1M.

Nice description! I do suspect that there will be additional
bugs that we run into with some corner cases.

> diff --git a/arch/arm/boot/compressed/vmlinux.lds.S
> b/arch/arm/boot/compressed/vmlinux.lds.S
> index 3fcb3e62dc56..da21244aa892 100644
> --- a/arch/arm/boot/compressed/vmlinux.lds.S
> +++ b/arch/arm/boot/compressed/vmlinux.lds.S
> @@ -89,7 +89,7 @@ SECTIONS
> * The EFI stub always executes from RAM, and runs strictly before
> the
> * decompressor, so we can make an exception for its r/w data, and
> keep it
> */
> - *(.data.efistub .bss.efistub)
> + *(.data.* .bss.*)
> __pecoff_data_end = .;
>
> /*

This doesn't seem right to me, or maybe I misunderstand what
the original version does. Have you tested with both
CONFIG_EFI_STUB on and off, and booting with and without
UEFI?

If I read this right, you would move all .data and .bss
into the stub here, not just the parts we actually want?

> diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
> index bd9127c4b451..de373c6c2ae8 100644
> --- a/arch/arm/kernel/vmlinux.lds.S
> +++ b/arch/arm/kernel/vmlinux.lds.S
> @@ -74,7 +74,7 @@ SECTIONS
> . = ALIGN(4);
> __ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {
> __start___ex_table = .;
> - ARM_MMU_KEEP(*(__ex_table))
> + ARM_MMU_KEEP(KEEP(*(__ex_table)))
> __stop___ex_table = .;
> }
>
> @@ -116,7 +116,7 @@ SECTIONS
> #endif
> .init.pv_table : {
> __pv_table_begin = .;
> - *(.pv_table)
> + KEEP(*(.pv_table))
> __pv_table_end = .;
> }

I guess this prevents discarding any function that has a reference
from pv_table or ex_table, even if there are no other references,
right?

I don't know how to solve this other than forcing all the
uaccess and virt_to_phys functions to be out of line
helpers. For uaccess, there are probably very few functions
that need this, so it should make little difference.

You might want to try changing CONFIG_ARM_PATCH_PHYS_VIRT
into a method that just always adds an offset from C code
instead of the boot time patching. That way the code would
be a bit less efficient but you might be able to get
a larger size reduction by dropping additional unused code.

Maybe test your patch both with and without
ARM_PATCH_PHYS_VIRT to see what the best-case impact would
be.

Arnd