Re: [PATCH 02/16] DRBD: lru_cache

From: Kyle Moffett
Date: Sun May 03 2009 - 10:07:18 EST

Next message: James Bottomley: "Re: [PATCH 00/16] DRBD: a block device for HA clusters"
Previous message: Boaz Harrosh: "Re: [PATCH 08/10] block: cleanup rq->data_len usages"
In reply to: Lars Ellenberg: "Re: [PATCH 02/16] DRBD: lru_cache"
Next in thread: Lars Ellenberg: "Re: [PATCH 02/16] DRBD: lru_cache"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sun, May 3, 2009 at 2:27 AM, Lars Ellenberg
<lars.ellenberg@xxxxxxxxxx> wrote:
> When we created our lru_cache stuff, we considered embedding callbacks
> and internal locking, but decided against it. ÂConceptually it should be
> more like the "list.h" list handling infrastructure.
>
> The user will have their own locking in place anyways, and in general
> their critical section will be a few lines of code larger than the
> "lru cache" manipulation itself.

One of the major design-points for the code I'm fiddling with is that
it allows you to use RCU on your lookup table, which basically means
lock-free lookup (although I haven't stress-tested that part of it yet
so the code itself may have subtle bugs). With a bit more work it's
probably even possible to make it lock-free even when adding a
reference to an object that's currently on the LRU.

> And, the specific use of our implementation is that there is a
> pre-selected maximum count of in-use objects, and the user gets
> feedback about changes to this "active" set of objects.

Another major design point (and the reason for the single "evict"
callback) is that my code does not require manual tuning, it responds
to memory-pressure dynamically using the "shrinker" mechanism. So on
a box with 128MB of RAM your LRU cache will be automatically
size-limited by other activity on the system to an appropriate size;
yet it can scale up to tens or hundreds of megabytes on a system with
hundreds of gigs of RAM under heavy IO load.

The real deal-breaker for your code is its usage of "vmalloc", it's
unlikely to be merged when it relies on vmalloc() of a large
continuous block for operation.

> That is where the focus is:
> make the set of active objects easily trackable.
> So one can easily keep track of who is in, and who is not,
> by writing a log of just this "diff":
> seat index was occupied by element_nr A, but now is by element_nr B.

This could be very easily done with tracepoints and a few minor tweaks
to the implementation I provided. I could add an object number and
various statistics similar to the ones in your code; the only reason I
didn't before is I did not need them and they detracted slightly from
the simplicity of the implementation (just 271 lines).

Keep in mind that by using the kmem_cache infrastructure, you get to
take advantage of all of the other SLAB debugging features on objects
allocated through your LRUs. This includes redzones and
poison-on-free. It also makes the number of objects and the number of
pages allocated show up in /proc/slabinfo and the various SLAB/SLUB
debug tools.

> So from looking at your code, it may be fine for the "lru" part,
> but it is not suitable for our purposes.

It would need an extra layer stacked on top to handle the hash-table
lookups, but it would solve the vmalloc issue and allow your LRU lists
to dynamically size a bit better. It's also not that difficult to
apply memory allocation limits (aside from the default
memory-pressure) and add additional statistics and debugging info.

Cheers,
Kyle Moffett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: James Bottomley: "Re: [PATCH 00/16] DRBD: a block device for HA clusters"
Previous message: Boaz Harrosh: "Re: [PATCH 08/10] block: cleanup rq->data_len usages"
In reply to: Lars Ellenberg: "Re: [PATCH 02/16] DRBD: lru_cache"
Next in thread: Lars Ellenberg: "Re: [PATCH 02/16] DRBD: lru_cache"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]