Re: [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem

From: Nick Desaulniers
Date: Mon Oct 10 2022 - 18:35:16 EST


On Mon, Oct 10, 2022 at 3:14 PM Arnd Bergmann <arnd@xxxxxxxxxx> wrote:
>
> On Mon, Oct 10, 2022, at 11:23 PM, Nick Desaulniers wrote:
> > On Sat, Jul 16, 2022 at 2:47 AM Arnd Bergmann <arnd@xxxxxxxxxx> wrote:
> >> On Sat, Jul 16, 2022 at 2:16 AM Nick Desaulniers <ndesaulniers@xxxxxxxxxx> wrote:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/nwfpe/softfloat.c#n2312
> > Any creative ideas on how to avoid this? Perhaps putting the `aSig -=
> > bSig;` in inline asm? Inserting a `barrier()` or empty asm statement
> > into the loops also seems to work.
>
> I was going to suggest adding a barrier() as well, should have
> read on first ;-)

barrier() forces reloads+spills in the loop. The output with `-mllvm
-replexitval=never` is optimal (assuming the loop is faster than
__aeabi_uldivmod (which I think is unprovable).
https://godbolt.org/z/7dMabYYcM

As much I hate relying on compiler-internal flags, I think this is optimal:
```
diff --git a/arch/arm/nwfpe/Makefile b/arch/arm/nwfpe/Makefile
index 303400fa2cdf..2aec85ab1e8b 100644
--- a/arch/arm/nwfpe/Makefile
+++ b/arch/arm/nwfpe/Makefile
@@ -11,3 +11,9 @@ nwfpe-y += fpa11.o
fpa11_cpdo.o fpa11_cpdt.o \
entry.o

nwfpe-$(CONFIG_FPE_NWFPE_XP) += extended_cpdo.o
+
+# Try really hard to avoid generating calls to __aeabi_uldivmod() from
+# float64_rem() due to loop elision.
+ifdef CONFIG_CC_IS_CLANG
+CFLAGS_softfloat.o += -mllvm -replexitval=never
+endif
```

Part of me is tempted to move float64_rem() to its own file for that
flag, but indvars+loop-utils isn't eliding other loops in that file
(comparing the full disassembly before+after the above diff).

Long term, it might be nice for us to have `--rtlib` recognize
`--rtlib=linux-kernel@version` or something so that we could better
describe the effective compiler runtime to the compiler. There are
already differences in compiler-rt and libgcc where we could make
better codegen decisions if we were to consider the target rtlib.
These libraries also change over time though...
--
Thanks,
~Nick Desaulniers