Re: pgcl-2.6.0-test5-bk3-17

From: William Lee Irwin III
Date: Sat Nov 29 2003 - 20:13:55 EST


On Thu, Nov 27, 2003 at 11:21:48PM -0800, William Lee Irwin III wrote:
> Now also ported to 2.6.0-test11:
> ftp://ftp.kernel.org/pub/linux/kernel/people/wli/vm/pgcl/pgcl-2.6.0-test11-1.gz
> This also corrects some PAGE_SHIFT instances that crept into mm/mmap.c
> while I wasn't looking and drops sym2 driver changes.

Quick performance and functionality test on a 32x/64GB 700MHz P-III.
Compressed dmesg attached. Not sure where the SCSI error came from.

This seems to show that reworking the rmap interaction is a pressing
performance issue (however pressing kernel compiles can be considered
to be, never mind all the actual bugs) since the rmap functions are
burning more cpu time than bitblitting luserspace. Unfortunately there
isn't really enough disk for on-disk dbench etc. so I resorted to ramfs,
whose readdir() etc. methods appear not to be very well parallelized. I
should probably have tried ext2 on loop on ramfs or some such nonsense.

The good news is that it does still run on the thing even after all of
the massive shakeups, rewrites, and time since my last boot on it in May.


Kernel compile on ext2:

$ time make -s -j bzImage > /dev/null
arch/i386/boot/setup.S: Assembler messages:
arch/i386/boot/setup.S:168: Warning: value 0x77dfffff truncated to 0x77dfffff
Root device is (8, 2)
Boot sector 512 bytes.
Setup is 2538 bytes.
System is 1117 kB
make -s -j bzImage > /dev/null 712.96s user 284.86s system 1140% cpu 1:27.46 total

6094329 total 3.8079
5787061 default_idle 90422.8281
85912 __page_remove_rmap 298.3056
38331 page_remove_rmap 140.9228
28287 copy_folio 88.3969
19051 page_add_rmap 79.3792
14709 __copy_to_user_ll 131.3304
7078 __copy_user_intel 40.2159
6901 __copy_from_user_ll 61.6161
5704 __page_add_rmap 22.2812
4710 __d_lookup 17.3162
4259 zap_pte_range 3.6464
4127 schedule 2.5288
3940 rmap_add_folio 14.4853
3852 kmap_atomic 8.0250
3258 kmem_cache_free 33.9375
3053 clear_page_tables 3.3476
2079 try_to_wake_up 4.0605
2012 find_get_page 31.4375
1815 current_kernel_time 22.6875
1638 kmap_atomic_sg 4.0950
1627 path_lookup 4.6222


dbench on ramfs:

$ time dbench 32
32 clients started
0 62477 296.11 MB/sec
Throughput 296.11 MB/sec 32 procs
dbench 32 20.60s user 989.43s system 2674% cpu 37.767 total

2328901 total 1.4552
1335226 default_idle 20862.9062
415966 .text.lock.libfs 2298.1547
168210 .text.lock.dec_and_lock 15291.8182
157310 .text.lock.dcache 291.3148
70329 dcache_readdir 146.5188
23983 atomic_dec_and_lock 347.5797
21502 __copy_to_user_ll 191.9821
18555 .text.lock.file_table 157.2458
16183 __copy_from_user_ll 144.4911
11169 d_alloc 23.2688
11055 filldir64 40.6434
6661 d_instantiate 69.3854
5826 __d_lookup 21.4191
5405 path_lookup 15.3551
4545 path_release 71.0156
3728 file_move 58.2500
3537 d_delete 18.4219
3336 fd_install 52.1250
3220 simple_prepare_write 20.1250
3081 call_rcu 48.1406
2704 vfs_readdir 21.1250


SDET 32 profile:

4605683 total 2.8778
3480214 default_idle 54378.3438
212021 page_remove_rmap 779.4890
163508 __page_remove_rmap 567.7361
98710 page_add_rmap 411.2917
87944 copy_folio 274.8250
31101 __page_add_rmap 121.4883
20883 simple_prepare_write 130.5188
19818 copy_page_range 15.6788
18555 .text.lock.dcache 34.3611
18277 zap_pte_range 15.6481
17973 rmap_add_folio 66.0772
17932 .text.lock.dec_and_lock 1630.1818
17816 __d_lookup 65.5000
17680 __down 78.9286
16517 __copy_to_user_ll 147.4732
11960 atomic_dec_and_lock 173.3333
11795 release_pages 36.8594
11256 kmap_atomic 23.4500
9696 kmem_cache_free 101.0000
9622 copy_mm 9.5456
9371 .text.lock.libfs 51.7735


SDET 128 profile:

19885416 total 12.4250
10870694 default_idle 169854.5938
1868923 page_remove_rmap 6871.0404
1722495 __page_remove_rmap 5980.8854
991381 page_add_rmap 4130.7542
619755 __down 2766.7634
403192 copy_folio 1259.9750
172220 __page_add_rmap 672.7344
164246 rmap_add_folio 603.8456
161239 __wake_up 2015.4875
123047 __d_lookup 452.3787
121684 schedule 74.5613
120902 zap_pte_range 103.5120
101418 __copy_to_user_ll 905.5179
99572 copy_page_range 78.7753
95709 simple_prepare_write 598.1812
76676 atomic_dec_and_lock 1111.2464
66245 remove_shared_vm_struct 517.5391
64159 path_lookup 182.2699
61922 release_pages 193.5062
61415 copy_mm 60.9276
61356 clear_page_tables 67.2763


I could probably use suggestions for a more mergeable method of
mitigating the rmap functions' overheads than the currently
available mm/rmap.c rewrites.

In more relevant news, I've started taking stabs at the rvmalloc()
mess and drm, with copious help from Zwane.


-- wli

Attachment: 64G.test11.log.gz
Description: 64G.pgcl.test11.log.gz