Re: [2.3.99-pre4-4,dget fixed] "Unlink of SHM id 8356095 failed

From: Andi Kleen (ak@suse.de)
Date: Mon Apr 10 2000 - 06:58:17 EST


Alexander Viro <viro@math.psu.edu> writes:

> On Sat, 8 Apr 2000, Alan Cox wrote:
>
> > > It seems to work OK though (uptime 12minutes). Quake works. I've got a
> > > lot of memory.
> > >
> > > The fixes that I have used are
> > > 1. put dget around BOTH arguments to lookup_one in ipc/shm.c
> > > 2. revert move of task_lock in exit.c
> >
> > #2 unfortunately has broken tcp and nfs interaction with /proc. It shouldnt
> > be needed to fix your problem. If it is then we have to rethink a chunk of
> > the locking a little (not much fortunately) as we cant hold the task lock
> > calling ->close().
>
> Hmm... OK, I see your point. The reason to have task_lock in that area is
> very simple and keeping it over the whole thing is an overkill. Race in
> question happens when one process does operations with
> /proc/<pid>/something and <pid> exits, removing ->{mm,fs,whatever} in the
> middle of operation. So
> down(&tsk->exit_sem);
> tsk->flags |= PF_EXITING;
> up(&tsk->exit_sem);
> ...
> will be sufficient here, provided that task_lock() and stuff in
> fs/proc/base.c will check for PF_EXITING. Currently they are looking at
> ->p_pptr. Then I see no point in task_lock() down the road in do_exit().
> Ooops... Nope, there _is_ a point in it and ->mm may need the protection
> after all - check flush_old_exec() for another user.

task_lock in do_exit seems to cause nasty effects:
- When you have a program that blocks in its release function (the
serial driver does sometimes for a few minutes when the carrier is missing -
POSIX breakage)
it'll block all access to /proc for a few minutes (every ps runs into that).
I've seen that regularly caused by a exiting mgetty serial console login.

I'm not sure how to fix it, but blocking semaphores are probably not
the correct way.

BTW, I did the same mistake shortly before 2.2 -- I had ``fixed'' /proc
to grab the mm semaphore when looking at the mm, which caused ps to block
in heavy swapping. Of course that was quickly reverted.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Apr 15 2000 - 21:00:13 EST