RE: [RFC 00/10] implement alternative and much simpler id allocator

From: Matthew Wilcox
Date: Fri Dec 23 2016 - 12:19:56 EST


From: Rasmus Villemoes [mailto:linux@xxxxxxxxxxxxxxxxxx]
> Nice work! A few random comments/questions:
>
> - It does add some complexity, but I think a few comments would make it
> more digestable.

I'm open to adding some comments ... I need some time between writing the code and writing the comments to be sure what comments are useful.

> - Hm, maybe I'm confused, and I certainly don't understand how the whole
> radix tree works. But do you use every leaf node as an exceptional
> entry initially, to allocate the first 62 ids from that level? This
> code

I do! And that question tells me one useful comment to add!

> if ((bit + RADIX_TREE_EXCEPTIONAL_SHIFT) <
> BITS_PER_LONG) {
> bit += RADIX_TREE_EXCEPTIONAL_SHIFT;
> radix_tree_iter_replace(root, &iter, slot,
> (void *)((1UL << bit) |
> RADIX_TREE_EXCEPTIONAL_ENTRY));
> *id = new;
> return 0;
> }
>
> operates on bit, which is the offset from index*IDA_BITMAP_BITS, and
> it seems to create an exceptional entry somewhere down the tree
> (which may of course be the root).
>
> But we don't seem to allocate another bit from that exceptional entry
> ever unless it happened to sit at index 0; the code
>
> unsigned long tmp = (unsigned long)bitmap;
> if (start < BITS_PER_LONG) {
> unsigned tbit = find_next_zero_bit(&tmp,
> BITS_PER_LONG, start);
> if (tbit < BITS_PER_LONG) {
> tmp |= 1UL << tbit;
> rcu_assign_pointer(*slot, (void *)tmp);
> *id = new + tbit -
> RADIX_TREE_EXCEPTIONAL_SHIFT;
> return 0;
> }
> }
>
> is only used for small values of start (though it does seem to
> account for a non-zero value of new == iter.index * IDA_BITMAP_BITS).

Ahh. You're reading the code wrong, and I wrote the code wrongly too, so it's clearly overly complex. I was testing with 'start' set to 0, allocating N entries, and then freeing them. If I'd set start to 1024 and allocated two entries, I'd've seen the failure.

Please see the top commit here ("Improve IDA exceptional entry handling"): http://git.infradead.org/users/willy/linux-dax.git/shortlog/refs/heads/idr-2016-12-20


> - In the micro-optimization department, I think we should avoid
> find_next_zero_bit on a single word, and instead use __ffs. Something
> like (assuming we can use bit instead of start)
>
> if (bit + RADIX_TREE_EXCEPTIONAL_SHIFT < BITS_PER_LONG) {
> tmp = (~(unsigned long)bitmap) >> (bit + RADIX_TREE_EXCEPTIONAL_SHIFT);
> if (tmp) {
> tbit = __ffs(tmp) + bit + RADIX_TREE_EXCEPTIONAL_SHIFT;
> tmp = (unsigned long)bitmap | (1UL << tbit);
> rcu_assign_pointer(*slot, (void *)tmp);
> *id = new + tbit - RADIX_TREE_EXCEPTIONAL_SHIFT;
> return 0;
> }
> }

I'm certainly open to microoptimisations, but I'll have to think about this one for a bit.