Re: [PATCH 1/4] bitops: Add single_bit_set()

From: Yury Norov
Date: Mon Nov 22 2021 - 12:54:20 EST


On Mon, Nov 22, 2021 at 02:57:27PM +0200, Andy Shevchenko wrote:
> On Mon, Nov 22, 2021 at 12:42:21PM +0000, Vaittinen, Matti wrote:
> > On 11/22/21 13:28, Andy Shevchenko wrote:
> > > On Mon, Nov 22, 2021 at 01:03:25PM +0200, Matti Vaittinen wrote:
> > >> There are cases when it is useful to check a bit-mask has only one bit
> > >> set. Add a generic helper for it instead of baking own one for each
> > >> user.
>
> > > So, you decided to reinvent hamming weight...
> > > Please, drop this patch and use corresponding hweight() call.
>
> > Thanks Andy.
> >
> > There are few differences to hamming weight here. We scan only given
> > amount of bits - and we will end scanning immediately when we hit second
> > set bit. Oh, and obviously we only return information whether there is
> > exactly one bit set. So no, this is not hamming weight().
>
> What do you mean by this?
>
> hweight() will return you the number of the non-zero elements in the set.
> In application to boolean based arrays it means the number of bits that
> are set. Obviously, the condition `hweight() == 1` is what you are looking
> for.

Hi Andy,

I think, Matti means earlier return when part of bitmap counts set
bits to a greater nubmer, and we can skip the rest. Right, Matti?

I agree that for Matti's usecase it's useless because 32-bit int is small,
and hweight() would count set bits with a single machine instruction. (And
it should be hweight32(), not bitmap_weight() in this case.)

But in general, it might be useful for long bitmaps.

The more complete way of doing this would be adding a new set of
functions: bitmap_weight_{eq,neq,gt,le}

I'm looking at how bitmap_weight is used in the kernel and see
quite a lot of places where this optimization may take place. For
example otx2_remove_flow() in drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c:

if (bitmap_weight(&flow_cfg->dmacflt_bmap,
flow_cfg->dmacflt_max_flows) == 1)
otx2_update_rem_pfmac(pfvf, DMAC_ADDR_DEL);

may be replaced with:

if (bitmap_weight_eq(&flow_cfg->dmacflt_bmap, flow_cfg->dmacflt_max_flows, 1)
otx2_update_rem_pfmac(pfvf, DMAC_ADDR_DEL);

Most of that places are in drivers however, and the length of bitmaps
there is typically small, so that there's no chance to get any
measurable performance improvement.

There is always a chance that we have opencoded bitmap_weight_eq()
et all. If we add these API, it might help people wright better code.

What do you think?

Thanks,
Yury