Re: [GIT PULL] bitmap patches for v6.8

From: Jan Kara
Date: Mon Jan 22 2024 - 14:48:26 EST


On Sun 21-01-24 15:35:18, Yury Norov wrote:
> On Sun, Jan 21, 2024 at 01:47:21PM -0800, Linus Torvalds wrote:
> > So I've left this to be my last pull request, because I hate how our
> > header files are growing, and this part:
> >
> > include/linux/find.h | 301 ++++++++++++++++++++++++++++++-
> > 1 file changed, 297 insertions(+), 4 deletions(-)
> >
> > in particular.
> >
> > Nobody includes <linux/find.h> directly, but indirectly pretty much
> > *every* single kernel C file includes it.
> >
> > Looking at some basic stats of my dependency files in my tree, 4426 of
> > 4526 object files (~98%) depend on find.h because they get it through
> > *some* path that ends up with bitmap.h -> find.h.
> >
> > And honestly, the number of files that actually want the new functions
> > is basically just a tiny handful. It's also not obvious how useful
> > those optimizations are, considering that a lot of the loops are
> > *tiny*. I looked at a few cases, and the size of the bitmap it was
> > iterating over was often in the 2-4 range, sometimes (like
> > RTW89_TXCH_NUM) 13, etc.
> >
> > In radio-shark, you replaced a loop like this
> >
> > for (i = 0; i < 2; i++) {
> >
> > with that for_each_test_and_clear_bit(), and it *really* isn't clear
> > that it was worth it. It sure wasn't performance-critical to begin
> > with.
> >
> > In general, if an "optimization" doesn't have any performance numbers
> > attached to it, is it an optimization at all?
>
> No, this is not a performance optimization, and I don't claim that.
> Jan Kara reported some performance improvement, but performance is not
> the main goal of the series, and I didn't run performance tests for
> this on myself.
>
> The original motivation came from the fact that using non-volatile
> find_bit() together with volatile test_and_set_bit() may trigger
> KCSAN warning on concurrent memory access.

Maybe to save Linus some reading: It is not only about KCSAN, it is a real
(theoretical) race. In principle the problem is that find_next_bit() does
something like:

if (*addr)
return __ffs(*addr)

and if the compiler decides to refetch *addr and we race with concurrent
writer bad things can happen. Now I wanted to fix this in a less intrusive
way using READ_ONCE()
(https://lore.kernel.org/linux-fsdevel/20231011150252.32737-1-jack@xxxxxxx/)
but Yury didn't like that and came up with this series.

> People wanted to make the whole find API volatile, which is a bad idea
> for sure. So I had to give them a reasonable alternative. After some
> thinking I decided that we just need a separate set of volatile find API.
> Check this for initial discussion:
>
> https://lore.kernel.org/lkml/634f5fdf-e236-42cf-be8d-48a581c21660@xxxxxxxxxxxx/T/#m3e7341eb3571753f3acf8fe166f3fb5b2c12e615
>
> It also makes the code cleaner, at least to my taste, and some
> reviewers' too. And to some degree less bug-prone. For example,
> ILSEL_LEVELS is just 15, but traversing code in ilsel_enable()
> is buggy, as Geert spotted. And switching to atomic find() fixes
> it automatically:
>
> https://lore.kernel.org/lkml/CAMuHMdWHxesM-EOOMtrrw3Caz+5Wux35QiKOjvwA=vwQpRe26Q@xxxxxxxxxxxxxx/T/#me53217b32fd5623af6c7eafa5c4ce6d0465f6c58
>
> (His review came just a couple days ago, after I submitted the pull
> request, so the tag is not there.)

I would just note that this pull as is still does not address the original
issue in xarray when using find_next_bit() and I'm not yet sure how/if you
plan to address it.

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR