Re: [PATCH v6 03/20] modpost: detect section mismatch for R_ARM_MOVW_ABS_NC and R_ARM_MOVT_ABS

From: Ard Biesheuvel
Date: Tue May 23 2023 - 08:21:17 EST


On Tue, 23 May 2023 at 13:59, Masahiro Yamada <masahiroy@xxxxxxxxxx> wrote:
>
> On Tue, May 23, 2023 at 6:50 AM Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
> >
> > On Mon, 22 May 2023 at 20:03, Nick Desaulniers <ndesaulniers@xxxxxxxxxx> wrote:
> > >
> > > + linux-arm-kernel
> > >
> > > On Sun, May 21, 2023 at 9:05 AM Masahiro Yamada <masahiroy@xxxxxxxxxx> wrote:
> > > >
> > > > ARM defconfig misses to detect some section mismatches.
> > > >
> > > > [test code]
> > > >
> > > > #include <linux/init.h>
> > > >
> > > > int __initdata foo;
> > > > int get_foo(int x) { return foo; }
> > > >
> > > > It is apparently a bad reference, but modpost does not report anything
> > > > for ARM defconfig (i.e. multi_v7_defconfig).
> > > >
> > > > The test code above produces the following relocations.
> > > >
> > > > Relocation section '.rel.text' at offset 0x200 contains 2 entries:
> > > > Offset Info Type Sym.Value Sym. Name
> > > > 00000000 0000062b R_ARM_MOVW_ABS_NC 00000000 .LANCHOR0
> > > > 00000004 0000062c R_ARM_MOVT_ABS 00000000 .LANCHOR0
> > > >
> > > > Relocation section '.rel.ARM.exidx' at offset 0x210 contains 2 entries:
> > > > Offset Info Type Sym.Value Sym. Name
> > > > 00000000 0000022a R_ARM_PREL31 00000000 .text
> > > > 00000000 00001000 R_ARM_NONE 00000000 __aeabi_unwind_cpp_pr0
> > > >
> > > > Currently, R_ARM_MOVW_ABS_NC and R_ARM_MOVT_ABS are just skipped.
> > > >
> > > > Add code to handle them. I checked arch/arm/kernel/module.c to learn
> > > > how the offset is encoded in the instruction.
> > > >
> > > > The referenced symbol in relocation might be a local anchor.
> > > > If is_valid_name() returns false, let's search for a better symbol name.
> > > >
> > > > Signed-off-by: Masahiro Yamada <masahiroy@xxxxxxxxxx>
> > > > ---
> > > >
> > > > scripts/mod/modpost.c | 12 ++++++++++--
> > > > 1 file changed, 10 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
> > > > index 34fbbd85bfde..ed2301e951a9 100644
> > > > --- a/scripts/mod/modpost.c
> > > > +++ b/scripts/mod/modpost.c
> > > > @@ -1108,7 +1108,7 @@ static inline int is_valid_name(struct elf_info *elf, Elf_Sym *sym)
> > > > /**
> > > > * Find symbol based on relocation record info.
> > > > * In some cases the symbol supplied is a valid symbol so
> > > > - * return refsym. If st_name != 0 we assume this is a valid symbol.
> > > > + * return refsym. If is_valid_name() == true, we assume this is a valid symbol.
> > > > * In other cases the symbol needs to be looked up in the symbol table
> > > > * based on section and address.
> > > > * **/
> > > > @@ -1121,7 +1121,7 @@ static Elf_Sym *find_tosym(struct elf_info *elf, Elf64_Sword addr,
> > > > Elf64_Sword d;
> > > > unsigned int relsym_secindex;
> > > >
> > > > - if (relsym->st_name != 0)
> > > > + if (is_valid_name(elf, relsym))
> > > > return relsym;
> > > >
> > > > /*
> > > > @@ -1312,11 +1312,19 @@ static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
> > > > unsigned int r_typ = ELF_R_TYPE(r->r_info);
> > > > Elf_Sym *sym = elf->symtab_start + ELF_R_SYM(r->r_info);
> > > > unsigned int inst = TO_NATIVE(*reloc_location(elf, sechdr, r));
> > > > + int offset;
> > > >
> > > > switch (r_typ) {
> > > > case R_ARM_ABS32:
> > > > r->r_addend = inst + sym->st_value;
> > > > break;
> > > > + case R_ARM_MOVW_ABS_NC:
> > > > + case R_ARM_MOVT_ABS:
> > > > + offset = ((inst & 0xf0000) >> 4) | (inst & 0xfff);
> > > > + offset = (offset ^ 0x8000) - 0x8000;
> > >
> > > The code in arch/arm/kernel/module.c then right shifts the offset by
> > > 16 for R_ARM_MOVT_ABS. Is that necessary?
> > >
> >
> > MOVW/MOVT pairs are limited to an addend of -/+ 32 KiB, and the same
> > value must be encoded in both instructions.
>
>
> In my understanding, 'movt' loads the immediate value to
> the upper 16-bit of the register.
>

Correct. It sets the upper 16 bits of a register without corrupting
the lower 16 bits.

> I am just curious about the code in arch/arm/kernel/module.c.
>
> Please see 'case R_ARM_MOVT_ABS:' part.
>
> [1] 'offset' is the immediate value encoded in instruction
> [2] Add sym->st_value
> [3] Right-shift 'offset' by 16
> [4] Write it back to the instruction
>
> So, the immediate value encoded in the instruction
> is divided by 65536.
>
> I guess we need something like the following?
> (left-shift by 16).
>
> if (ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_ABS ||
> ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_PREL)
> offset <<= 16;
>

No. The addend is not encoded in the same way as the effective immediate value.

The addend is limited to -/+ 32 KiB (range of s16), and the MOVT
instruction must use the same addend value as the MOVW instruction it
is paired with, without shifting.

This is necessary because otherwise, there is no way to handle an
addend/symbol combination that results in a carry between the lower
and upper 16 bit words. This is a consequence of the use of REL format
rather than RELA, where the addend is part of the relocation and not
encoded in the instructions.

>
>
>
> >
> > When constructing the actual immediate value from the symbol value and
> > the addend, only the top 16 bits are used in MOVT and the bottom 16
> > bits in MOVW.
> >
> > However, this code seems to borrow the Elf_Rela::addend field (which
> > ARM does not use natively) to record the intermediate value, which
> > would need to be split if it is used to fix up instruction opcodes.
>
> At first, modpost supported only RELA for section mismatch checks.
>
> Later, 2c1a51f39d95 ("[PATCH] kbuild: check SHT_REL sections")
> added REL support.
>
> But, the common code still used Elf_Rela.
>
>
> modpost does not need to write back the fixed instruction.
> modpost is only interested in the offset address.
>
> Currently, modpost saves the offset address in
> r->r_offset even for Rel. I do not like this code.
>
> So, I am trying to reduce the use of Elf_Rela.
> For example, this patch.
> https://patchwork.kernel.org/project/linux-kbuild/patch/20230521160426.1881124-8-masahiroy@xxxxxxxxxx/
>

Yeah, that looks better to me.

>
> > Btw the Thumb2 encodings of MOVT and MOVW seem to be missing here.
>
> Right, if CONFIG_THUMB2_KERNEL=y, section mismatch check.
>
> Several relocation types are just skipped.
>

Skipped entirely? Or only for the diagnostic print that outputs the symbol name?