Re: [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 20

From: Andrew Morton (akpm@digeo.com)
Date: Mon Dec 09 2002 - 14:40:49 EST

Next message: Andrew Morton: "Re: [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 20"
Previous message: Andrew Morton: "Re: [TRIVIAL PATCH 2.4.20] madvise_willneed makes bad limit comparison"
In reply to: george anzinger: "Re: [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 20"
Next in thread: Andrew Morton: "Re: [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 20"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

george anzinger wrote:
>
> ...
>
> Hm... This whole thing came up to solve and issue related to
> having a finite number of timers. The ID layer is just a
> way of saving a pointer to a given "thing" (a timer
> structure in this case) in a way that it can be recovered
> quickly. It is really just a tree structure with 32
> branches (or is it sizeof long branches) at each node.
> There is a bit map to indicate if any free slots are
> available and if so under which branch. This makes
> allocation of a new ID quite fast. The "reuse" thing is
> there to separate it from the original code which
> "attempted" to not reuse and ID for some time.

Is the "don't reuse an ID for some time" requirement still there?

I think you can use radix trees for this. Just put the pointer
to your "thing" direct into the tree. The space overhead will
be about the same.

radix-trees do not currently have a "find next empty slot from this
offset" function but that is quite straightforward. Not quite
as fast, unless an occupancy bitmap is added to the radix-tree
node. That's something whcih I have done before - in fact it was
an array of occupancy maps so I could do an efficient in-order
gang lookup of "all dirty pages from this offset" and "all locked
pages from this offset". It was before its time, and mouldered.

> ...
> > A lot of the functions in this header are too large to be inlined.
>
> Hm... What is "too large", i.e. how much code.

A few lines, I suspect.

> Also, is it used more than once?

Don't trust the compiler too much ;) Uninlining mpage_writepage()
saved a couple of hundred bytes of code, even though it has only
one call site.

> ...
> > Please, just open-code the locking. This simply makes it harder to follow the
> > main code.
>
> But makes it easy to change the lock method, to, for
> example, use irq or irqsave or "shudder" RCU.

A diligent programmer would visit all sites as part of that conversion
anyway.

> >
> > > +
> > > +static struct idr_layer *id_free;
> > > +static int id_free_cnt;
> >
> > hm. We seem to have a global private freelist here. Is the more SMP-friendly
> > slab not suitable?
>
> There is a short local free list to avoid calling slab with
> a spinlock held. Only enough entries are kept to allocate a
> new node at each branch from the root to leaf, and only for
> this reason.

Fair enough. There are similar requirements elsewhere and the plan
there is to create a page reservation API, so you can ensure that
the page allocator will be able to provide at least N pages. Then
take the lock and go for it.

I have code for that which is about to bite the bit bucket. But the
new version should be in place soon. Other users will be radix tree
nodes, pte_chains and mm_chains (shared pagetable patch).

> ...
> >
> > Recursion!
>
> Yes, it is a tree after all.

lib/radix_tree.c does everything iteratively.

> >
> > > +void idr_init(struct idr *idp)
> >
> > Please tell us a bit about this id layer: what problems it solves, how it
> > solves them, why it is needed and why existing kernel facilities are
> > unsuitable.
> >
> The prior version of the code had a CONFIG option to set the
> maximum number of timers. This caused enough memory to be
> "compiled" in to keep pointers to this many timers. The ID
> layer was invented (by Jim Houston, by the way) to eliminate
> this CONFIG thing. If I were to ask for a capability from
> slab that would eliminate the need for this it would be the
> ability to, given an address and a slab pool, to validate
> that the address was "live" and from that pool. I.e. that
> the address is a pointer to currently allocated block from
> that memory pool. With this, I could just pass the address
> to the user as the timer_id.

That might cause problems with 64-bit kernel/32-bit userspace.
Passing out kernel addresses in this way may have other problems..

> As it is, I need a way to give
> the user a handle that he can pass back that will allow me
> to quickly find his timer and, along the way, validate that
> he was not spoofing, or just plain confused.
>
> So what the ID layer does is pass back an available <id>
> (which I can pass to the user) while storing a pointer to
> the timer which is <id>ed. Later, given the <id>, it passes
> back the pointer, or NULL if the id is not in use.

OK.

> As I said above, the pointers are kept in "nodes" of 32
> along with a few bits of overhead, and these are arranged in
> a dynamic tree which grows as the number of allocated timers
> increases. The depth of the tree is 1 for up to 32 , 2 for
> up to 1024, and so on. The depth can never get beyond 5, by
> which time the system will, long since, be out of memory.
> At this time the leaf nodes are release when empty but the
> branch nodes are not. (This is an enhancement saved for
> later, if it seems useful.)
>
> I am open to a better method that solves the problem...

It seems reasonable. It would be nice to be able to use radix trees,
but that's a lot of work if the patch isn't going anywhere.

If radix trees are unsuitable then yes, dressing this up as a
new core kernel capability (documentation! separate patch!)
would be appropriate.

But I suspect the radix-tree _will_ suit, and it would be nice to grow
the usefulness of radix-trees rather than creating similar-but-different
trees. We can do whizzy things with radix-trees; more than at present.

Of course, that was only a teeny part of your patch. I just happened
to spy it as it flew past. Given that you're at rev 20, perhaps a
splitup and more accessible presentation would help.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andrew Morton: "Re: [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 20"
Previous message: Andrew Morton: "Re: [TRIVIAL PATCH 2.4.20] madvise_willneed makes bad limit comparison"
In reply to: george anzinger: "Re: [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 20"
Next in thread: Andrew Morton: "Re: [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 20"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Sun Dec 15 2002 - 22:00:15 EST