Re: [PATCH v4 3/5] arm64: mte: implement CONFIG_ARM64_MTE_COMP

From: Alexander Potapenko
Date: Fri Sep 22 2023 - 04:05:32 EST


On Fri, Jul 21, 2023 at 1:22 PM Andy Shevchenko
<andriy.shevchenko@xxxxxxxxxxxxxxx> wrote:
>
> On Thu, Jul 20, 2023 at 07:39:54PM +0200, Alexander Potapenko wrote:
> > The config implements the algorithm compressing memory tags for ARM MTE
> > during swapping.
> >
> > The algorithm is based on RLE and specifically targets 128-byte buffers
> > of tags corresponding to a single page. In the common case a buffer
> > can be compressed into 63 bits, making it possible to store it without
> > additional memory allocation.
>
> ...
>
> > +Programming Interface
> > +=====================
> > +
> > + .. kernel-doc:: arch/arm64/mm/mtecomp.c
>
> :export:

Done

> > +
>
> Is it dangling trailing blank line? Drop it.

Sorry, it's hard to attribute this comment. I am assuming it is
related to Documentation/arch/arm64/mte-tag-compression.rst - done.

> ...
>
> > +#include <linux/bitmap.h>
>
> > +#include <linux/bitops.h>
>
> This is guaranteed to be included by bitmap.h.

I think we'd better stick to IWYU here.
Ingo's patch: https://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git/commit/?id=32b1e9e4f5774951a3a80604a39fa1f0674c1833
specifically adds bitmap.h where bits.h is already present, without
removing the latter.
Although there might not be general consensus on this in the kernel
right now, I think Ingo's "Fast Kernel Headers" set out a good
direction.

>
> > +/*
> > + * Sizes of compressed values. These depend on MTE_TAG_SIZE and
>
> of the

This comment is gone now

>
> > + out_tags[0] = prev_tag;
>
> out_tags[cur_idx] ?

Yeah, looks more readable. Done.

>
> > + for (i = 0; i < MTE_PAGE_TAG_STORAGE; i++) {
> > + for (j = 0; j < 2; j++) {
> > + cur_tag = j ? (tags[i] % 16) : (tags[i] / 16);
> > + if (cur_tag == prev_tag) {
> > + out_sizes[cur_idx]++;
>
> > + } else {
> > + cur_idx++;
> > + prev_tag = cur_tag;
> > + out_tags[cur_idx] = prev_tag;
> > + out_sizes[cur_idx] = 1;
>
> Looking more at this I think there is still a room for improvement. I can't
> come up right now with a proposal (lunch time :-), but I would look into
>
> do {
> ...
> } while (i < MTE_...);
>
> approach.

We can e.g. get rid of the nested loop and iterate over tags instead
of bytes (see v5)


> > +static size_t mte_size_to_ranges(size_t size)
> > +{
> > + size_t largest_bits;
>
> > + size_t ret = 0;
>
> Redundant assignment. Please, check again all of them.

Done.

> > +
> > + largest_bits = (size == 8) ? MTE_BITS_PER_LARGEST_IDX_INLINE :
> > + MTE_BITS_PER_LARGEST_IDX;
> > + ret = (size * 8 + MTE_BITS_PER_SIZE - largest_bits) /
>
> Hmm... I thought that we moved BYTES_TO_BITS() to the generic header...
> Okay, never mind.

Ack

> > + (MTE_BITS_PER_TAG + MTE_BITS_PER_SIZE);
> > + return ret;
>
> return (...) / ...;

Done

> > +}
>
> ...
>
> > +static size_t mte_alloc_size(unsigned int num_ranges)
> > +{
> > + size_t sizes[4] = { 8, 16, 32, 64 };
>
> Hooray! And now it's not needed anymore...
>
> > + unsigned int i;
> > +
> > + for (i = 0; i < ARRAY_SIZE(sizes); i++) {
>
> ...as sizes[i] is equivalent of (8 << i).

It's gone now.

> ...
>
> > +/**
> > + * mte_compress() - compress the given tag array.
> > + * @tags: 128-byte array to read the tags from.
> > + *
> > + * Compresses the tags and returns a 64-bit opaque handle pointing to the
> > + * tag storage. May allocate memory, which is freed by @mte_release_handle().
>
> + blank line here.

Done (here and in other places in the file), but I'm wondering why
https://docs.kernel.org/doc-guide/kernel-doc.html does not mandate it.

>
> > + * Returns: 64-bit tag storage handle.
> > + */
>
> ...
>
> > + /*
> > + * mte_compress_to_buf() only initializes the bits that mte_decompress()
> > + * will read. But when the tags are stored in the handle itself, it must
> > + * have all its bits initialized.
> > + */
> > + unsigned long result = 0;
>
> // Actually it's interesting how it's supposed to work on 32-bit
> // builds...

It is not supposed to work on 32 bit.
First, the code is in arch/arm64 :)
Second, 32-bit CPUs do not support MTE (which reserves the four upper
bits of the address)


>
> > +static unsigned long mte_bitmap_read(const unsigned long *bitmap,
> > + unsigned long *pos, unsigned long bits)
> > +{
> > + unsigned long result;
> > +
> > + result = bitmap_read(bitmap, *pos, bits);
> > + *pos += bits;
> > + return result;
>
> unsigned long start = *pos;
>
> *pos += bits;
> return bitmap_read(bitmap, start, bits);

Done, thanks!

> > +}
>
> ...
>
> > + unsigned short r_sizes[46], sum = 0;
>
> See below.
>
> ...
>
> It's cleaner and more robust to have
>
> sum = 0;
>
> here.

Moved it inside the loop init statement


> --
> With Best Regards,
> Andy Shevchenko
>

Thank you!