[2.3.99-pre5/pre6] Serious kswapd or kflushd bug

From: Miguel Freitas (mfreitas@mls.com.br)
Date: Sat Apr 22 2000 - 16:45:13 EST


   Hi,

   (please, excuse my poor english...)

   Yesterday I accidentally create a heavy and unusual load on my linux
that lead me to find this problem. I don't have (yet :) the kernel
understanding to say, for sure, where the bug is, but it is very
reproducible here. Because the results are unpredictable I have no Oops
for you folks (sorry!). My utilities to understand what was happening
are gtop and /proc/meminfo.

   I found it first with pre5 but it occurs on pre6 too. I was making a
image from a cd with x-cd-roast (0.98a5) which is a frontend to readcd
(1.8.1a5). My cdr drive (cw-7502) is connected in a old adaptec isa card
(ava1502), so reading the cd at 8x gives about 80-90% of cpu in system
tasks. I think this is because of the slow isa bus sustaining 1200kb/s.

   The program starts reading the cd and I can see a steady increase in
cached memory. When the memory is almost full the kswapd wakeup and
starts to swapout all the running programs. It takes about 10-20% of
cpu in doing that. Only X, gtop and some little programs (<1024k) stays
resident in memory. After that any thing could happen: In my tests there
was (1) Oops, (2) X dying, (2) gtop page fault, (1) machine hangup.

   In the times were linux was still usable after creating image I noted
2 things: kswapd was defunct and I got the following message on
shutdown: "Turning off swap VM: undead swap entry <number>". IMHO, I
guess that kflushd is not doing his job correctly. But I don't know if
there is another bug with kswapd which could not handle that heavy page
swap.

   Some other usefull information: running a lot of sync commands during
the image generation does not avoid the swap memory utilization but save
the most of running programs from being
swapped and system does not hangup/dies.
   I made the same test with kernels 2.3.48 and 2.2.14 and there was no
swapping in neither of them. The memory was almost full (like with
2.3.99) but somehow the dirty pages have been written to disk and
discarded correctly.
   As you saw above I can easily reproduce the bug, so if you need more
info feel free to contact me.

   BTW: it's a Celeron 500 with 192MB. before the test there was about
30MB in cached memory. it reached about 110MB (or more) during the test.
the only way i found to free it again was deleting the generated image
file.

Regards,

Miguel Freitas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Apr 23 2000 - 21:00:21 EST