Re: Please don't beat me up (was Re: Bugs and wishes in memory management area)

Mike Jagdis (
Mon, 2 Dec 1996 12:47:42 +0000 (GMT/BST)

> Yep, but...there are those occassions when more than one page is
> needed. These are the hardest requests to handle (or prepare for).
> I was suggesting 'priming' the free memory pool for these cases (without
> introducing a large overhead - less restrictions on selecting a page to
> reap == more chance of getting the page 'type' you need).

My problem with preparing for multi-page allocations in advance is
that you incur overheads coalescing pages into blocks and then splitting
them again. This costs on things like network latency and bandwidth
for instance. My preferred approach at the moment is to take the
hit on first allocation of a particular size but to tend to keep
requested sizes around for a while so that, if this is other than
a one off, further requests can be satisfied with due speed. I am
also considering the wisdom of doing some coalescing during idle
time to keep a few suitably large blocks around in order to minimize
the miss on the first request in particular. I haven't implemented
this however...

> Think you might have missed my point (or I didn't make it v. well), or
> I'm mis-understanding your reply. What you say above is correct. My point
> was extending the 'swap-cache', to make it more general (into a
> "relcaim-list").

Ah, I was thinking more of the page cache. However, aren't pages in
the swap cache also in the page cache? I seem to remember that reclaiming
from the page cache needs a delete_from_swap_cache. Are we getting the
swap and page caches confused?

> When Linux identifies a named page to be reaped, it _only_ removes it
> from the address space under the process that it found the page. There
> maybe other processes that still have a ref to the page. Hence, all the
> searching work has been done for zero gain - OK, not totally zero. At
> least when it stumbles over the other mappings for the page it will be
> released.

Yes, this could get expensive (the extra invalidates are probably
not good for SMP either). I tend to look on this as a seperate
problem for now however.

> Good. I though it would be messey (is it?). I was thinking about using
> the age of pages in the cache when deciding what would coalesce with
> pages in the free memory pool. Perhaps using a random selection policy
> for a page is good enough (is this what you're doing?). By random I
> mean pseudo-random, ie. the first page we find, from the previous
> starting point, which allows a coalesce.

Essentially random, yes, since there is no way to predict the current
distribution of pages and their uses. It's simplistic in that it
doesn't look ahead to see if it will ultimately be able to build
a block big enough but seems to balance out well since over coalescing
on one request increases the chance of future requests trivially
succeeding - so the spread of latencies widens but the average stays
reasonably good. Of course, a one off request for 1MB of contiguous
DMA-able memory is always going to be somewhat anti-social :-).

> Slow? Slow! Maybe in the v. _worst_ case. The common, critical, path is
> short, with few branches (haven't checked the assembly - I should do! -
> just relied on how I believe gcc produces code. eg. "if() goto ???" when
> the 'if' is rarely taken, and "if () {common-case}" when the 'if' will
> normally be taken. But I should check what it produces...).

My timings show it often slower than the current stock kernel for the
simple, benchmarkable, cases whereas I can go faster fairly consistently
(without modifying anything other than kmalloc.c and page_alloc.c).
I haven't looked closely enough to say where the time goes though.

> As for the L1 cache? A structure is usually referenced many times before
> it is freed, so helping whatever L1 cache is available is well worthwhile.
> (Note, my critical paths have no extra instructions to allow for cache
> alignment). Init-ing a struct inline may be quick, but it will
> still place instructions into both types of caches (instruction and
> data). If a constructed state can be maintained for a structure (and
> hence, some code out of the 'hot' path), then it should be.

True enough but the instructions to init a structure are generally
pretty fast, easily pipelined and inline so in-cache. The structure
they are writing to is almost certain to be either in cache or likely
to be accessed in the very near future anyway. The assumption is
that a "recently" freed structure will be reused in the "near" future
while it is still in cache. I don't know the precise details of other
chips but on the 6x86 the L1 is "only" 16KB organised as 512 lines
by 32 bytes. How long will an L1 line remain resident? I went for
the "fast reuse of the right size block" rather than the instruction
saving. Unfortunately I don't know anyway to track cache hits. (I seem
to remember that the Alpha has some cache statistics capabilities
but that's out of my reach :-( ).

> > (It still does the silly power-of-two sizing though, doesn't it?)
> Most definitely not, walk the code (or ask me questions)!
> In my patch the 'general purpose caches' are 2^n because I was rushed in
> getting it out and couldn't be bothered to tune the sizes (not a good
> recommendation on my part...).

Ah, I saw the table and jumped to the conclusion that it was significant.


|  Mike Jagdis                  |  Internet:   |
|  Roan Technology Ltd.         |                                      |
|  54A Peach Street, Wokingham  |  Telephone:  +44 118 989 0403        |
|  RG40 1XG, ENGLAND            |  Fax:        +44 118 989 1195        |