Re: [PATCH v11] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

From: Christophe Leroy
Date: Mon Mar 04 2024 - 06:39:14 EST


Hi Russell and Guenter,

Le 03/03/2024 à 16:26, Guenter Roeck a écrit :
> On 3/3/24 02:20, Christophe Leroy wrote:
>>
>>
>> Le 01/03/2024 à 19:32, Guenter Roeck a écrit :
>>> This leaves the mps2-an385:mps2_defconfig crash, which is avoided by
>>> this patch.
>>> My understanding, which may be wrong, is that arm images with thumb
>>> instructions
>>> do not support unaligned accesses (maybe I should say do not support
>>> unaligned
>>> accesses with the mps2-an385 qemu emulation; I did not test with real
>>> hardware,
>>> after all).

...

>>
>> Can you tell how to proceed ?
>>
>
> You can't run it directly. mps2-an385 is one of the platforms where
> the qemu maintainers insisted that qemu shall not initialize the CPU.
> You have to provide a shim such as
> https://github.com/groeck/linux-build-test/blob/master/rootfs/arm/mps2-boot.axf
> as bios. You also have to provide the dtb file.
>
> On top of that, you would need a customized version of qemu which
> actually reads the command line, the bios file, and the dtb. See
> https://github.com/groeck/linux-build-test/tree/master/qemu
> branch v8.2.1-local or v8.1.5-local.
>

Many thanks for your guidance. So, I did the test and what I can say:

ip_fast_csum() works whatever the alignment is.

csum_ipv6_magic() is the problem with unaligned ipv6 source or
destination addresses:

[ 0.503757] KTAP version 1
[ 0.503854] 1..1
[ 0.504156] KTAP version 1
[ 0.504251] # Subtest: checksum
[ 0.504563] # module: checksum_kunit
[ 0.504730] 1..5
[ 0.546418] ok 1 test_csum_fixed_random_inputs
[ 0.627853] ok 2 test_csum_all_carry_inputs
[ 0.704918] ok 3 test_csum_no_carry_inputs
[ 0.705845] ok 4 test_ip_fast_csum
[ 0.706320]
[ 0.706320] Unhandled exception: IPSR = 00000006 LR = fffffff1
[ 0.706796] CPU: 0 PID: 28 Comm: kunit_try_catch Tainted: G
N 6.8.0-rc1-00609-g9c0b7a2e25f0 #649
[ 0.707177] Hardware name: Generic DT based system
[ 0.707400] PC is at __csum_ipv6_magic+0x8/0xb4
[ 0.708170] LR is at test_csum_ipv6_magic+0x3d/0xa4
[ 0.708415] pc : [<211b0da8>] lr : [<210e3bf5>] psr: 0100020b
[ 0.708692] sp : 2153debc ip : 46c7f0d2 fp : 00000000
[ 0.708919] r10: 00000000 r9 : 2141dc48 r8 : 211e0e20
[ 0.709148] r7 : 00003085 r6 : 00000001 r5 : 2141dd24 r4 : 211e0c2e
[ 0.709422] r3 : 2c000000 r2 : 1ac7f0d2 r1 : 211e0c19 r0 : 211e0c09
[ 0.709704] xPSR: 0100020b


I don't know much about ARM instruction set, seems like the ldr
instruction used in ip_fast_csum() doesn't mind unaligned accesses while
ldmia instruction used in csum_ipv6_magic() minds. Or is it a wrong
behaviour of QEMU ?

If I change the test as follows to only use word aligned IPv6 addresses,
it works:

diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c
index 225bb7701460..4d86fc8ccd78 100644
--- a/lib/checksum_kunit.c
+++ b/lib/checksum_kunit.c
@@ -607,7 +607,7 @@ static void test_csum_ipv6_magic(struct kunit *test)
const int csum_offset = sizeof(struct in6_addr) + sizeof(struct
in6_addr) +
sizeof(int) + sizeof(char);

- for (int i = 0; i < NUM_IPv6_TESTS; i++) {
+ for (int i = 0; i < NUM_IPv6_TESTS; i += 4) {
saddr = (const struct in6_addr *)(random_buf + i);
daddr = (const struct in6_addr *)(random_buf + i +
daddr_offset);


If I change csum_ipv6_magic() as follows to use instruction ldr instead
of ldmia, it also works without any change to the test:

diff --git a/arch/arm/lib/csumipv6.S b/arch/arm/lib/csumipv6.S
index 3559d515144c..a312d0836b95 100644
--- a/arch/arm/lib/csumipv6.S
+++ b/arch/arm/lib/csumipv6.S
@@ -12,12 +12,18 @@
ENTRY(__csum_ipv6_magic)
str lr, [sp, #-4]!
adds ip, r2, r3
- ldmia r1, {r1 - r3, lr}
+ ldr r2, [r1], #4
+ ldr r3, [r1], #4
+ ldr lr, [r1], #4
+ ldr r1, [r1]
adcs ip, ip, r1
adcs ip, ip, r2
adcs ip, ip, r3
adcs ip, ip, lr
- ldmia r0, {r0 - r3}
+ ldr r1, [r0], #4
+ ldr r2, [r0], #4
+ ldr r3, [r0], #4
+ ldr r0, [r0]
adcs r0, ip, r0
adcs r0, r0, r1
adcs r0, r0, r2


So now we are back to the initial question, should checksumming on
unaligned addresses be supported or not ?

Russell I understand from previous answers from you that half-word
alignment should be supported, in that case should ARM version of
csum_ipv6_magic() be modified ? In that case can you propose the most
optimised fix ?

If not, then the test has to be fixed to only use word-aligned IPv6
addresses.

Thanks
Christophe