Re: problem with linux 2.2.15pre4 + Rik's VM fix

From: Rik van Riel (riel@nl.linux.org)
Date: Thu Jan 27 2000 - 11:23:49 EST


On 26 Jan 2000, Kevin Fenzi wrote:

> I just booted into 2.2.15pre4 + Rik's vm fix...
>
> Could this vm problem be related a problem I have been seeing for
> a while sporadically:
>
> 2.2.14 after about 2 weeks I got a slocate process stuck in
> wait_on_buffer (disk wait).
>
> 2.2.14pre4 after about 2 days I got the same thing.

This sounds exactly like what we've been expecting. The problem
can show up in 2.2.14 and earlier, but it showed up so rarely
that we never managed to track it down, or even identify it.

With the new VM code the problem is more likely to occur and we
found it pretty quickly. My VMfix patch contains code to work
around the problem and alert us of buggy pieces of code calling
__get_free_pages()...

> Could this be slocate holding some kind of buffer lock?

Indeed. The program holds a lock and then tries to allocate some
memory. But if there isn't enough free memory available at that
moment, it might have to sleep for a while ...

Nothing wrong with that, TASK_RUNNING processes will be woken up
again pretty soon. BUT, non-running processes will be removed from
the runqueue and not woken up again. There's code in the kernel
that cannot sleep, yet calls __get_free_pages() with __GFP_WAIT
set.

> >>EIP; c011e595 <do_anonymous_page+2d/80> <=====
> >>EIP; c011e0f9 <do_wp_page+19/200> <=====

It's unlikely that these are the culprits. Other pieces of code
are calling them with task->state != TASK_RUNNING...
(time for more of this debugging code)

> I now have my slocate set to run under strace tonight so I can see
> where it is locking up...if anyone has any ideas why it's
> happening or has seen the same thing, I would love to hear it.

If all goes well it won't lock up again because there's detection
and workaround code in the kernel now...

regards,

Rik

--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Jan 31 2000 - 21:00:18 EST