Re: Hanging problems in 2.0.30 -- discoveries

Bill Hawes (whawes@star.net)
Fri, 01 Aug 1997 22:41:57 -0400


Philip Gladstone wrote:
>
> I can persuade my system to hang repeatedly under 2.0.30-pre2
> under heavy load. It turns out that it doesn't really
> hang, but a wait_queue becomes corrupted and the kernel
> goes into an infinite loop trying to take something
> off the queue. [I added checks at add time to ensure that
> the wait queue is acceptably short (less than 1000 entries)].

Looks like an important bug -- we better find it!

> Something bad is going on.... Does anybody have any ideas?

I'll toss out an idea -- in copy_mm() for the non-cloning case, the new
mm is initialized by copying current->mm. If by some chance the
semaphore in current->mm was active, this would give the new mm
structure a bogus semaphore. If current was itself cloned from another
task, it's possible that other task might be using the mm semaphore,
thus setting the conditions for the above to happen.

If this is the case, at least it would be easy to fix.

Will study the code tomorrow ... any further clues would be welcome.

Regards,
Bill