Performance/Memory usage patch

Mark Hemment (markhe@nextd.demon.co.uk)
Wed, 9 Apr 1997 10:29:05 +0100 (BST)


Hi all,

I've just hung a 'test' patch (against 2.1.32) off my home-page;
http://www.nextd.demon.co.uk/

It gives (hopefully) linux-2.1.32 a _small_ performance boost, and a
reduction in memory usage. It is not rocket-science.

The patch contains;
o The latest version (almost) of the SLAB allocator. This
new version supports head-&-tail red-zoning,
object poisioning, and stat collecting, along with better
management of large objects (the hash-table has gone).
o kmalloc()/kfree() are now built on top of the SLAB.
o Several allocations now use the SLAB directly, either
by having there own obj-cache, or by obtaining a ptr
to one of the general-size caches (kmem_find_general_cachep()).
o Alignment of some structures to be L1 cache friendly.
o Removal of the swap_cache array (which always caused
a 'far' memory reference in free_page()). The test
now uses the flag member in 'struct page' and pg_swap_entry.
o Removal of tests against mem_map[].count in fs/buffer.c
Buffer pages can only ever have a count of 1 (at least
I think so....).
o Removal of the test for a buffer page in filemap_write_page().
As far as I can tell, there is no-way for buffer pages to
be mapped into a task's address space (not even via /dev/kmem,
which only maps Reserved pages).
o Reduction in size of "struct page". A page cannot be both
in the buffer-cache and named-page-cache at the same time,
neither can it be in either of these caches and be an
anonymous page (which means it can be marked swap-cached).
NOTE: the "pg_map_nr" is now only valid for a free page.
This is only temporary, to make the structure have an
even number of members (so common members can be paired
to make them L1 friendly).
o If my maths are correct, the memory that was being reserved
for the buddy maps (page_alloc.c) was twice the amount
necessary.
o A few functions (in filemap.c and do_wp_page()) now
hang-on to a page (via global cache_page variables) to
prevent them from thrashing the gfp() functions.
o Slightly optimised versions of clear_page() and copy_page().
I'm no ASM programmer, and these are just in there to see
what happens (certain helps on my 486 test target).
o Remove a race in fs/inode.c, and reduce the amount of
zeroing of inodes. Also improves the partial ordering of the
inode freelist.
o Few changes in mm/swapfile.c to use ptrs rather than indices.
o A new page allocation function for allocating single pages
for priority GFP_USER (the priority at which all user pages
should be allocated).

I'd appreciate anyone giving this patch a trying, but a few points to
note:

The figures under "lmbench" show limited/no improvement, but running other
tests (such as a large number of kernel compiles) do show a performance
increase (particularly on a 486 box).

I believe networking throughput/latencies take a small performance hit.
This is probably due to different byte alignments of dynamically allocated
memory. I do not understand the networking code - perhaps somebody could
try SLABising it (making sure common accessed members are on the same h/w
cache line, and using kmem_find_general_cachep()).

The second argument to kfree_s() is now honoured rather than being
thrown away by the pre-processor. This did cause compile problems for
ipv4 (fixes in the patch).

Regards,

markhe

------------------------------------------------------------------
Mark Hemment, Unix/C Software Engineer (Contractor)
markhe@nextd.demon.co.uk http://www.nextd.demon.co.uk/
"Success has many fathers, failure is a B**TARD!" - anon
------------------------------------------------------------------