RE: [PATCH v12 4/5] riscv: Add checksum library

From: Wang, Xiao W
Date: Wed Dec 20 2023 - 05:28:35 EST




> -----Original Message-----
> From: Charlie Jenkins <charlie@xxxxxxxxxxxx>
> Sent: Wednesday, December 13, 2023 10:11 AM
> To: Palmer Dabbelt <palmer@xxxxxxxxxxx>; Conor Dooley
> <conor@xxxxxxxxxx>; Samuel Holland <samuel.holland@xxxxxxxxxx>; David
> Laight <David.Laight@xxxxxxxxxx>; Wang, Xiao W <xiao.w.wang@xxxxxxxxx>;
> Evan Green <evan@xxxxxxxxxxxx>; linux-riscv@xxxxxxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; linux-arch@xxxxxxxxxxxxxxx
> Cc: Paul Walmsley <paul.walmsley@xxxxxxxxxx>; Albert Ou
> <aou@xxxxxxxxxxxxxxxxx>; Arnd Bergmann <arnd@xxxxxxxx>; Conor Dooley
> <conor.dooley@xxxxxxxxxxxxx>
> Subject: Re: [PATCH v12 4/5] riscv: Add checksum library
>
> On Tue, Dec 12, 2023 at 05:18:41PM -0800, Charlie Jenkins wrote:
> > Provide a 32 and 64 bit version of do_csum. When compiled for 32-bit
> > will load from the buffer in groups of 32 bits, and when compiled for
> > 64-bit will load in groups of 64 bits.
> >
> > Additionally provide riscv optimized implementation of csum_ipv6_magic.
> >
> > Signed-off-by: Charlie Jenkins <charlie@xxxxxxxxxxxx>
> > Acked-by: Conor Dooley <conor.dooley@xxxxxxxxxxxxx>
> > Reviewed-by: Xiao Wang <xiao.w.wang@xxxxxxxxx>
> > ---
> > arch/riscv/include/asm/checksum.h | 13 +-
> > arch/riscv/lib/Makefile | 1 +
> > arch/riscv/lib/csum.c | 326
> ++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 339 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/riscv/include/asm/checksum.h
> b/arch/riscv/include/asm/checksum.h
> > index 2fcf864186e7..3fa04ff1eda8 100644
> > --- a/arch/riscv/include/asm/checksum.h
> > +++ b/arch/riscv/include/asm/checksum.h
> > @@ -12,6 +12,17 @@
> >
> > #define ip_fast_csum ip_fast_csum
> >
> > +extern unsigned int do_csum(const unsigned char *buff, int len);
> > +#define do_csum do_csum
> > +
> > +/* Default version is sufficient for 32 bit */
> > +#ifndef CONFIG_32BIT
> > +#define _HAVE_ARCH_IPV6_CSUM
> > +__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
> > + const struct in6_addr *daddr,
> > + __u32 len, __u8 proto, __wsum sum);
> > +#endif
> > +
> > /* Define riscv versions of functions before importing asm-
> generic/checksum.h */
> > #include <asm-generic/checksum.h>
> >
> > @@ -69,7 +80,7 @@ static inline __sum16 ip_fast_csum(const void *iph,
> unsigned int ihl)
> > .option pop"
> > : [csum] "+r" (csum), [fold_temp] "=&r" (fold_temp));
> > }
> > - return csum >> 16;
> > + return (__force __sum16) (csum >> 16);

I notice that this type conversion comes in after V10. This change should go to patch 3/5.

BRs,
Xiao

[...]
> > +
> > +/*
> > + * Perform a checksum on an arbitrary memory address.
> > + * Will do a light-weight address alignment if buff is misaligned, unless
> > + * cpu supports fast misaligned accesses.
> > + */
> > +unsigned int do_csum(const unsigned char *buff, int len)
> > +{
> > + if (unlikely(len <= 0))
> > + return 0;
> > +
> > + /*
> > + * Significant performance gains can be seen by not doing alignment
> > + * on machines with fast misaligned accesses.
> > + *
> > + * There is some duplicate code between the "with_alignment" and
> > + * "no_alignment" implmentations, but the overlap is too awkward to
> be
> > + * able to fit in one function without introducing multiple static
> > + * branches. The largest chunk of overlap was delegated into the
> > + * do_csum_common function.
> > + */
> > + if (static_branch_likely(&fast_misaligned_access_speed_key))
> > + return do_csum_no_alignment(buff, len);
> > +
> > + if (((unsigned long)buff & OFFSET_MASK) == 0)
> > + return do_csum_no_alignment(buff, len);
> > +
> > + return do_csum_with_alignment(buff, len);
> > +}
> >
> > --
> > 2.43.0
> >
>
> There is potentially a code size concern here. These changes do require
> alternatives, and as such it increases the resulting binary size. The
> bloat-o-meter script reports that the do_csum function grows to twice
> the size with this patch:
>
> Function old new delta
> do_csum 238 514 +276
>
> The other functions are harder to measure because they get inlined or
> are not included in generic code. However the do_csum is the most
> impacted because of the misaligned access behavior.
>
> The performance improvements afforded by alternatives (with the Zbb
> extension) and with the misaligned access checking are significant. In
> my testing these optimizations alone contribute to over a 20% performance
> improvement.
>
> - Charlie