Re: [PATCH v4 1/5] lib/bitmap: add bitmap_{set,get}_value()

From: Yury Norov
Date: Wed Jul 26 2023 - 20:14:55 EST


On Wed, Jul 26, 2023 at 10:08:28AM +0200, Alexander Potapenko wrote:
> On Sun, Jul 23, 2023 at 3:57 AM Yury Norov <yury.norov@xxxxxxxxx> wrote:
> >
> > On Thu, Jul 20, 2023 at 07:39:52PM +0200, Alexander Potapenko wrote:
> > > +/**
> > > + * bitmap_write - write n-bit value within a memory region
> > > + * @map: address to the bitmap memory region
> > > + * @value: value of nbits
> > > + * @start: bit offset of the n-bit value
> > > + * @nbits: size of value in bits, up to BITS_PER_LONG
> > > + */
> > > +static inline void bitmap_write(unsigned long *map,
> > > + unsigned long value,
> > > + unsigned long start, unsigned long nbits)
> > > +{
> > > + size_t index = BIT_WORD(start);
> > > + unsigned long offset = start % BITS_PER_LONG;
> > > + unsigned long space = BITS_PER_LONG - offset;
> > > +
> > > + if (unlikely(!nbits))
> > > + return;
> > > + value &= GENMASK(nbits - 1, 0);
> >
> > Strictly speaking, a 'value' shouldn't contain set bits beyond nbits
> > because otherwise it's an out-of-bonds type of error.
>
> I can easily imagine someone passing -1 (or ~0) as a value, but
> wanting to only write n bits of n.

This is an abuse of new API because we've got a bitmap_set(). But
whatever, let's keep that masking.

...

> I like the idea of sharing the first write between the branches, and
> it can be made even shorter:
>
> ===========================================================
> void bitmap_write_new(unsigned long *map, unsigned long value,
> unsigned long start, unsigned long nbits)
> {
> unsigned long offset;
> unsigned long space;
> size_t index;
> bool fit;
>
> if (unlikely(!nbits))
> return;
>
> value &= GENMASK(nbits - 1, 0);
> offset = start % BITS_PER_LONG;
> space = BITS_PER_LONG - offset;
> index = BIT_WORD(start);
> fit = space >= nbits;

space >= nbits <=>
BITS_PER_LONG - offset >= nbits <=>
offset + nbits <= BITS_PER_LONG

> map[index] &= (fit ? (~(GENMASK(nbits - 1, 0) << offset)) :

So here GENMASK(nbits + offset - 1, offset) is at max:
GENMASK(BITS_PER_LONG - 1, offset). And it never overflows, which is my
point. Does it make sense?

> ~BITMAP_FIRST_WORD_MASK(start));

As I said, ~BITMAP_FIRST_WORD_MASK() is the same as BITMAP_LAST_WORD_MASK()
and vise-versa.

> map[index] |= value << offset;
> if (fit)
> return;
>
> map[index + 1] &= ~BITMAP_LAST_WORD_MASK(start + nbits);
> map[index + 1] |= (value >> space);
> }
> ===========================================================
>
> According to Godbolt (https://godbolt.org/z/n5Te779bf), this function
> is 32 bytes shorter than yours under x86 Clang, and 8 bytes - under
> GCC (which on the other hand does a poor job optimizing both).
>
> Overall, given that there's currently a single user of these
> functions, isn't it premature to optimize them without knowing
> anything about their performance?
>
> > In previous iteration, I asked you to share disassembly listings for the
> > functions. Can you please do that now?
>
> Will godbolt work for you (see above)?

I don't know for how long an external resource will keep the reference
alive. My SSD keeps emails long enough.

...

> > You're mentioning that the compression ratio is 2 to 20x. Can you
> > share the absolute numbers? If it's 1k vs 2k, I think most people
> > just don't care...
>
> I'll provide the exact numbers with the next patch series. Last time I
> checked, the order of magnitude was tens of megabytes.

That's impressive. Fruitful idea. It would be important for embedded guys
who may disable MTE because of memory overhead. I think it's worth to
mention that in Kconfig together with associate performance overhead,
if it ever measurable.

> > Can you share the code that you used to measure the compression ratio?
> > Would it make sense to export the numbers via sysfs?
>
> For out-of-line allocations the data can be derived from
> /proc/slabinfo, but we don't calculate inline allocations.
> Agreed, a debugfs interface won't hurt.