Re: [RCU problem] was VFS: file-max limit 50044 reached

From: Eric Dumazet
Date: Mon Oct 17 2005 - 09:55:29 EST


Eric Dumazet a écrit :
Dipankar Sarma a écrit :

On Mon, Oct 17, 2005 at 02:10:09PM +0200, Eric Dumazet wrote:

Dipankar Sarma a écrit :

On Mon, Oct 17, 2005 at 11:10:04AM +0200, Eric Dumazet wrote:

Agreed. It is not designed to work that way, so there must be
a bug somewhere and I am trying to track it down. It could very well
be that at maxbatch=10 we are just queueing at a rate far too high
compared to processing.


I can freeze my test machine with a program that 'only' use dentries, no files.

No message, no panic, but machine becomes totally unresponsive after few seconds.

Just greping for call_rcu in kernel sources gave me another call_rcu() use from syscalls. And yes 2.6.13 has the same problem.



Can you try it with rcupdate.maxbatch set to 10000 in boot
command line ?


Changing maxbatch from 10 to 10000 cures the problem.
Maybe we could initialize maxbatch to (10000000/HZ), considering no current cpu is able to queue more than 10.000.000 items per second in a list.


Well... after one 90 minutes of stress, I got an OOM even with maxbatch=10000

Out of Memory : Killed process 1759 (mysqld)

Maybe because on this HT machine, all (timer and network) interrupts are taken by CPU0.

So if the user program is bound on CPU1, may be this cpu only performs syscalls and no rcu state change at all.


Oct 17 18:24:25 localhost kernel: oom-killer: gfp_mask=0xd0, order=0
Oct 17 18:24:25 localhost kernel: Mem-info:
Oct 17 18:24:25 localhost kernel: DMA per-cpu:
Oct 17 18:24:25 localhost kernel: cpu 0 hot: low 2, high 6, batch 1 used:5
Oct 17 18:24:25 localhost kernel: cpu 0 cold: low 0, high 2, batch 1 used:1
Oct 17 18:24:25 localhost kernel: cpu 1 hot: low 2, high 6, batch 1 used:2
Oct 17 18:24:25 localhost kernel: cpu 1 cold: low 0, high 2, batch 1 used:0
Oct 17 18:24:25 localhost kernel: Normal per-cpu:
Oct 17 18:24:25 localhost kernel: cpu 0 hot: low 62, high 186, batch 31 used:168
Oct 17 18:24:25 localhost kernel: cpu 0 cold: low 0, high 62, batch 31 used:55
Oct 17 18:24:25 localhost kernel: cpu 1 hot: low 62, high 186, batch 31 used:95
Oct 17 18:24:25 localhost kernel: cpu 1 cold: low 0, high 62, batch 31 used:33
Oct 17 18:24:25 localhost kernel: HighMem per-cpu:
Oct 17 18:26:17 localhost kernel: cpu 0 hot: low 62, high 186, batch 31 used:166
Oct 17 18:26:17 localhost kernel: cpu 0 cold: low 0, high 62, batch 31 used:29
Oct 17 18:26:17 localhost kernel: cpu 1 hot: low 62, high 186, batch 31 used:176
Oct 17 18:26:17 localhost kernel: cpu 1 cold: low 0, high 62, batch 31 used:13
Oct 17 18:26:17 localhost kernel: Free pages: 1136620kB (1129392kB HighMem)
Oct 17 18:26:17 localhost kernel: Active:8040 inactive:3876 dirty:1 writeback:0 unstable:0 free:284155 slab:218548 mapped:8064 pagetables:130
Oct 17 18:26:17 localhost kernel: DMA free:3588kB min:68kB low:84kB high:100kB active:0kB inactive:0kB present:16384kB pages_scanned:246 all_unreclaimable? no
Oct 17 18:26:17 localhost kernel: lowmem_reserve[]: 0 880 2031
Oct 17 18:26:17 localhost kernel: Normal free:3640kB min:3756kB low:4692kB high:5632kB active:76kB inactive:24kB present:901120kB pages_scanned:8581 all_unreclaimable? no
Oct 17 18:26:17 localhost kernel: lowmem_reserve[]: 0 0 9215
Oct 17 18:26:17 localhost kernel: HighMem free:1129392kB min:512kB low:640kB high:768kB active:32084kB inactive:15480kB present:1179520kB pages_scanned:0 all_unreclaimable? no
Oct 17 18:26:17 localhost kernel: lowmem_reserve[]: 0 0 0
Oct 17 18:26:17 localhost kernel: DMA: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3588kB
Oct 17 18:26:17 localhost kernel: Normal: 0*4kB 1*8kB 1*16kB 1*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3640kB
Oct 17 18:26:17 localhost kernel: HighMem: 518*4kB 301*8kB 119*16kB 54*32kB 22*64kB 13*128kB 6*256kB 1*512kB 0*1024kB 1*2048kB 272*4096kB = 1129392kB
Oct 17 18:26:17 localhost kernel: Swap cache: add 0, delete 0, find 0/0, race 0+0
Oct 17 18:26:17 localhost kernel: Free swap = 1012016kB
Oct 17 18:26:17 localhost kernel: Total swap = 1012016kB
Oct 17 18:26:17 localhost kernel: Free swap: 1012016kB
Oct 17 18:26:17 localhost kernel: 524256 pages of RAM
Oct 17 18:26:17 localhost kernel: 294880 pages of HIGHMEM
Oct 17 18:26:17 localhost kernel: 5472 reserved pages
Oct 17 18:26:17 localhost kernel: 11361 pages shared
Oct 17 18:26:18 localhost kernel: 0 pages swap cached
Oct 17 18:26:18 localhost kernel: 1 pages dirty
Oct 17 18:26:18 localhost kernel: 0 pages writeback
Oct 17 18:26:18 localhost kernel: 8064 pages mapped
Oct 17 18:26:18 localhost kernel: 218548 pages slab
Oct 17 18:26:18 localhost kernel: 130 pages pagetables
Oct 17 18:26:18 localhost kernel: Out of Memory: Killed process 1759 (mysqld).

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/