Re: [PATCH] idr: Use this_cpu_ptr() for percpu_ida

From: Kent Overstreet
Date: Wed Aug 21 2013 - 17:25:05 EST


On Wed, Aug 21, 2013 at 05:16:50PM -0400, Tejun Heo wrote:
> Hello, Kent.
>
> On Wed, Aug 21, 2013 at 02:09:01PM -0700, Kent Overstreet wrote:
> > These "micro optimizations" mean either less pointer chasing or less
> > branching in the _common_ case; you'd trade common case performance for
> > avoiding ever doing higher order allocations (and 2 with COMPACTION=n
> > and 4 with COMPACTION=y is not particularly high order!).
>
> Order 4 allocation probably isn't as bad as before but it still is a
> lot nastier than single page allocations. You say doing it the other
> way would harm the common case performance but didn't answer my
> question about the number of IDs being served per page. How many can
> be served from a single page? And how many from two layer single page
> configuration? How are you defining the "common" case?

With single page allocations:

1 << 15 bits per page

1 << 9 pointers per page

So two layers of pointers does get us to 1 << 33 bits, which is what we
need.

But now, since we need two layers of pointers instead of one, we need
either another pointer deref for a node lookup - _always_, even when
we've got 8 bytes of bits - or we need to branch on the depth of the
tree, which is something we don't have now.

This is extra overhead _no matter the size of the ida_, over my current
approach.

I'm assuming the common case is < one page of bits, based on the usage
I've seen throughout the kernel that's probably way conservative.

In that case, your approach is going to be slower than mine, and there's
no difference in the size of the allocations.

> > I don't buy that that's a good tradeoff. If you're convinced radix trees
> > are the way to go and it can be done without much performance cost, why
> > not code it up and show us?
>
> Well, I'm not the one trying to rewrite ida, so the onus to justify
> the proposed code is primarily on you. Another thing is that the
> proposed code is *not* using the existing radix tree and instead
> implementing its own simplified radix tree, which *can* be fine but
> the bar to clear is fairly high. You have to be able to show
> *clearly* that using the existing radix tree is not an option. Until
> now, the only thing that I gathered is the simplified thing is gonna
> be faster in some extreme cases while having clear disadvantage in
> terms of memory allocation. Not very convincing.

I've already shown massive performance gains over the existing radix
tree approach, you're the one claiming a different approach would be
better.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/