Re: More info on my sig11 problems

tim middlekoop (mtim@lab.housing.fsu.edu)
Thu, 11 Apr 1996 18:52:56 -0400 (EDT)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Nuno Serrenho: "1.3.86 - Stability under high loads: *GOOD*"
Previous message: Steven L Baur: "Re: 1.3.86: double lock on device queue! + Socket destroy delayed"

Well I got rid of my sig-11's by installing a 16$ cpu fan, thank's to
whoever else mentioned it earlier. Disabling the external cache also
fixed the problems but was unaccetable.
tim...

--
Tim Middelkoop                                   O
http://intrism.hcsys.com/~mtim                  /\,
mtim@freenet.tlh.fl.us                        \/\
----Try linux today----                         /

GE d- s+:- a22 C++ UL++ P++++ L+++ E+> W++ N+ o? K? w--(++)
O M V-- PS+ PE Y+ PGP t 5 X R- tv b++ DI? D+ G e+ h-- r-- y+


On Thu, 11 Apr 1996, Bernd Schmidt wrote:

> 
> Marek Michalkiewicz wrote:
> > I have tried my "make -j" stress test on several kernels, and the
> > results are as follows:
> > 
> > 1.3.58 is OK, compiles until it runs out of swap space,
> > 1.3.59 gives lots of sig11's while there is still plenty of free swap.
> 
> This matches my experience.
> 
> > Things like setting lower CPU speed, slowest possible RAM timings,
> > disabling caches, even trying this on another known good old machine
> > (over 2 years old 486sx33, 4MB RAM, cheap slow ISA IDE interface)
> > doesn't change anything.  It seems more likely to happen on machines
> > with less RAM (more swapping) - on this second slow machine I start
> > getting sig11's much faster than on the faster one with 8MB RAM.
> 
> Same here. It is _definitely_ not the machine's fault. It's likely that this
> problem does not show up on machines with more than 8MB RAM which don't have to
> swap quite as much.
> 
> I experimented some more with this yesterday. The reliable way to make the 
> problem go away is to turn off asynchronous swapping:
> --- linux-1.3.85/mm/vmscan.c    Sat Apr  6 10:41:15 1996
> +++ linux/mm/vmscan.c   Wed Apr 10 00:00:05 1996
> @@ -402,7 +402,7 @@
>                 swapstats.wakeups++;
>                 /* Do the background pageout: */
>                 for (i=0; i < kswapd_ctl.maxpages; i++)
> -                       try_to_free_page(GFP_KERNEL, 0, 0);
> +                       try_to_free_page(GFP_KERNEL, 0, 1);
>         }
>  }
> 
> While experimenting with the 1.3.58 and 1.3.59 kernels, I noticed that the
> problem appears much more frequently if I don't apply the patch that gives
> reads and writes the same priority. If I use the old macro from 1.3.58 that
> prioritizes reads with a newer kernel, the segmentation faults are much more 
> numerous and appear faster.
> I can only guess what the problem is, but what about the following scenario:
> 
> kswapd decides to swap out a page of process A. It starts IO on those pages,
> but does not wait until it finishes. However, the page is marked as being in
> swap. kswapd returns, and process A runs again. It faults on the page that is
> being swapped out, and tries to read it from swap. Is it possible that it reads
> the page from swap _before_ the real contents have been written there?
> 
> I made another experiment to try this:
> diff -urd linux-1.3.85/arch/i386/mm/init.c linux/arch/i386/mm/init.c
> --- linux-1.3.85/arch/i386/mm/init.c	Tue Apr  9 18:23:20 1996
> +++ linux/arch/i386/mm/init.c	Wed Apr 10 23:58:48 1996
> @@ -95,6 +95,7 @@
>  	printk("%d free pages\n",free);
>  	printk("%d reserved pages\n",reserved);
>  	printk("%d pages shared\n",shared);
> +	printk("%d async pages\n",nr_async_pages);
>  	show_buffers();
>  #ifdef CONFIG_NET
>  	show_net_buffers();
> diff -urd linux-1.3.85/fs/buffer.c linux/fs/buffer.c
> --- linux-1.3.85/fs/buffer.c	Tue Apr  9 18:23:24 1996
> +++ linux/fs/buffer.c	Wed Apr 10 21:58:52 1996
> @@ -77,6 +77,8 @@
>  
>  static void wakeup_bdflush(int);
>  
> +struct wait_queue * async_pages_queue = NULL;
> +
>  #define N_PARAM 9
>  #define LAV
>  
> @@ -1241,6 +1243,8 @@
>  	if (page->free_after) {
>  		extern int nr_async_pages;
>  		nr_async_pages--;
> +		if (nr_async_pages == 0)
> +			wake_up(&async_pages_queue);
>  		page->free_after = 0;
>  		free_page(page_address(page));
>  	}
> diff -urd linux-1.3.85/mm/page_alloc.c linux/mm/page_alloc.c
> --- linux-1.3.85/mm/page_alloc.c	Mon Mar 25 16:19:06 1996
> +++ linux/mm/page_alloc.c	Thu Apr 11 00:41:07 1996
> @@ -23,6 +23,8 @@
>  #include <asm/bitops.h>
>  #include <asm/pgtable.h>
>  
> +extern struct wait_queue * async_pages_queue;
> +
>  int nr_swap_pages = 0;
>  int nr_free_pages = 0;
>  
> @@ -313,6 +315,8 @@
>  {
>  	unsigned long page = __get_free_page(GFP_KERNEL);
>  
> +	while (nr_async_pages > 0)
> +		sleep_on(&async_pages_queue);
>  	if (pte_val(*page_table) != entry) {
>  		free_page(page);
>  		return;
> 
> What it does: it waits until all asynchronous swapping has been finished before
> doing a swap_in call.
> The result: No more crashes. However, after a while the machine locked up
> solid because nr_async_pages did not reach zero. Shift-ScrollLock showed that
> it was 1. Could this be because nr_async_pages isn't modified with the 
> atomic_* functions?
> (Besides, this second patch seems to make things really inefficient. But at
> least it no longer crashes)
>

Next message: Nuno Serrenho: "1.3.86 - Stability under high loads: *GOOD*"
Previous message: Steven L Baur: "Re: 1.3.86: double lock on device queue! + Socket destroy delayed"