Re: (shm WARNING) shm still deadlocks

From: Manfred Spraul (manfreds@colorfullife.com)
Date: Mon Mar 20 2000 - 09:52:14 EST


From: "Christoph Rohland" <hans-christoph.rohland@sap.com>
>
> I tried to reproduce this on 2.3.99-pre3-2. I did run 60 recursive
> wget over 100MBIT against my apache and no lockup at all. This is on
> SMP and UP, under memory pressure and without.
>
> Do you use any special apache modules?

I don't have enough time for a full review of the recent changes, but at
least the comment in IPC_RMID is dangerous:

    up(&shm_ids.sem);
    /* The kernel lock prevents new attaches being happening. [...]
    */
    shm_remove_name(id)

Sounds very dangerous. The scheduler automatically releases the big kernel
lock, and if VFS sleeps on a semaphore, or if shm_unlink sleeps on
down(&shm_ids.sem), then we get a crash.
Unless VFS protects us, the following could happen:

CPU0:
    sys_shmctl(IPC_RMID);
    up(&shm_ids.sem);
    do_unlink();
CPU1:
    sys_shmget();
    down(&shm_ids.sem);
CPU0:
    shm_unlink() : calls down(&shm_ids.sem); <<<< sleeps. releases the
kernel lock.
CPU1:
    sys_shmget() returns with success.
    sys_shmat(). attaches.

and now the scheduler decides that the shm_unlink operation should continue.
crash.

And a cosmetic note: the big kernel lock is more important than the
shm_ids.sem semaphore. I would reorder

    lock_kernel();
    down(&shm_ids.sem);
to
    down(&shm_ids.sem);
    lock_kernel();

This was standard for the mmap semaphore and lock_kernel in 2.2.
Neither version will lock-up.

--
    Manfred

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Mar 23 2000 - 21:00:29 EST