Ohhh ... these values are really at the limit of this system
> /var/log/news/expire.rm.20784
>
> As you can see, the "expireover" prcess grows very big. The machine
> rebooted at 03:13, and I have no report between 3:01 and 3:13. I assume
> the machine was trashing and/or cron died. There are no messages in
> the syslog file, as I had the last time (with Werner's 7 Jul patch).
> I can only assume that watchdog rebooted the machine because it
> couldn't fork() or the load was above 35 (treshold I set it at).
Hmmm .... please remove the two lines
/* Give the physical reallocated page a bigger start */
mem_map[MAP_NR(page)].age = (2*PAGE_INITIAL_AGE);
in shm_swap_in() of ipc/shm.c and in swap_in() of mm/page_alloc.c ...
the swapped in pages of the big expireover job should not get the bigger
start of twice of PAGE_INITIAL_AGE because try_of_free_page() in mm/vmscan.c
runs in trouble to get a aged page from this big job for the same big job.
... maybe the system needs more physical ram in the future for a 2.2.x ...
>
> It's a shame that this is the only machine I can reliably reproduce it on,
> since it's a production machine our customers rely on.
>
> I guess "out of memory" is impossible to fix, let's not start a new
> thread about that :)
>
> I also got a message from Bill Hawes, with his "Flexible refill_freelist 2.0.30
> patch". I will just once more compile a kernel with that additional patch,
> and then we'll see what happens tonight at three o'clock.
IMHO this will not help for this particular problem ....
>
> I'll keep you posted.
Let's see ...
Werner