Re: More info on my sig11 problems

tim middlekoop (mtim@lab.housing.fsu.edu)
Thu, 11 Apr 1996 18:52:56 -0400 (EDT)


Well I got rid of my sig-11's by installing a 16$ cpu fan, thank's to
whoever else mentioned it earlier. Disabling the external cache also
fixed the problems but was unaccetable.
tim...

--
Tim Middelkoop                                   O
http://intrism.hcsys.com/~mtim                  /\,
mtim@freenet.tlh.fl.us                        \/\
----Try linux today----                         /

GE d- s+:- a22 C++ UL++ P++++ L+++ E+> W++ N+ o? K? w--(++) O M V-- PS+ PE Y+ PGP t 5 X R- tv b++ DI? D+ G e+ h-- r-- y+

On Thu, 11 Apr 1996, Bernd Schmidt wrote:

> > Marek Michalkiewicz wrote: > > I have tried my "make -j" stress test on several kernels, and the > > results are as follows: > > > > 1.3.58 is OK, compiles until it runs out of swap space, > > 1.3.59 gives lots of sig11's while there is still plenty of free swap. > > This matches my experience. > > > Things like setting lower CPU speed, slowest possible RAM timings, > > disabling caches, even trying this on another known good old machine > > (over 2 years old 486sx33, 4MB RAM, cheap slow ISA IDE interface) > > doesn't change anything. It seems more likely to happen on machines > > with less RAM (more swapping) - on this second slow machine I start > > getting sig11's much faster than on the faster one with 8MB RAM. > > Same here. It is _definitely_ not the machine's fault. It's likely that this > problem does not show up on machines with more than 8MB RAM which don't have to > swap quite as much. > > I experimented some more with this yesterday. The reliable way to make the > problem go away is to turn off asynchronous swapping: > --- linux-1.3.85/mm/vmscan.c Sat Apr 6 10:41:15 1996 > +++ linux/mm/vmscan.c Wed Apr 10 00:00:05 1996 > @@ -402,7 +402,7 @@ > swapstats.wakeups++; > /* Do the background pageout: */ > for (i=0; i < kswapd_ctl.maxpages; i++) > - try_to_free_page(GFP_KERNEL, 0, 0); > + try_to_free_page(GFP_KERNEL, 0, 1); > } > } > > While experimenting with the 1.3.58 and 1.3.59 kernels, I noticed that the > problem appears much more frequently if I don't apply the patch that gives > reads and writes the same priority. If I use the old macro from 1.3.58 that > prioritizes reads with a newer kernel, the segmentation faults are much more > numerous and appear faster. > I can only guess what the problem is, but what about the following scenario: > > kswapd decides to swap out a page of process A. It starts IO on those pages, > but does not wait until it finishes. However, the page is marked as being in > swap. kswapd returns, and process A runs again. It faults on the page that is > being swapped out, and tries to read it from swap. Is it possible that it reads > the page from swap _before_ the real contents have been written there? > > I made another experiment to try this: > diff -urd linux-1.3.85/arch/i386/mm/init.c linux/arch/i386/mm/init.c > --- linux-1.3.85/arch/i386/mm/init.c Tue Apr 9 18:23:20 1996 > +++ linux/arch/i386/mm/init.c Wed Apr 10 23:58:48 1996 > @@ -95,6 +95,7 @@ > printk("%d free pages\n",free); > printk("%d reserved pages\n",reserved); > printk("%d pages shared\n",shared); > + printk("%d async pages\n",nr_async_pages); > show_buffers(); > #ifdef CONFIG_NET > show_net_buffers(); > diff -urd linux-1.3.85/fs/buffer.c linux/fs/buffer.c > --- linux-1.3.85/fs/buffer.c Tue Apr 9 18:23:24 1996 > +++ linux/fs/buffer.c Wed Apr 10 21:58:52 1996 > @@ -77,6 +77,8 @@ > > static void wakeup_bdflush(int); > > +struct wait_queue * async_pages_queue = NULL; > + > #define N_PARAM 9 > #define LAV > > @@ -1241,6 +1243,8 @@ > if (page->free_after) { > extern int nr_async_pages; > nr_async_pages--; > + if (nr_async_pages == 0) > + wake_up(&async_pages_queue); > page->free_after = 0; > free_page(page_address(page)); > } > diff -urd linux-1.3.85/mm/page_alloc.c linux/mm/page_alloc.c > --- linux-1.3.85/mm/page_alloc.c Mon Mar 25 16:19:06 1996 > +++ linux/mm/page_alloc.c Thu Apr 11 00:41:07 1996 > @@ -23,6 +23,8 @@ > #include <asm/bitops.h> > #include <asm/pgtable.h> > > +extern struct wait_queue * async_pages_queue; > + > int nr_swap_pages = 0; > int nr_free_pages = 0; > > @@ -313,6 +315,8 @@ > { > unsigned long page = __get_free_page(GFP_KERNEL); > > + while (nr_async_pages > 0) > + sleep_on(&async_pages_queue); > if (pte_val(*page_table) != entry) { > free_page(page); > return; > > What it does: it waits until all asynchronous swapping has been finished before > doing a swap_in call. > The result: No more crashes. However, after a while the machine locked up > solid because nr_async_pages did not reach zero. Shift-ScrollLock showed that > it was 1. Could this be because nr_async_pages isn't modified with the > atomic_* functions? > (Besides, this second patch seems to make things really inefficient. But at > least it no longer crashes) >