Re: ipc,sem: sysv semaphore scalability

From: Rik van Riel
Date: Tue Mar 26 2013 - 14:17:31 EST


On 03/26/2013 02:07 PM, Sasha Levin wrote:
On 03/26/2013 01:51 PM, Davidlohr Bueso wrote:
On Tue, 2013-03-26 at 13:33 -0400, Sasha Levin wrote:
On 03/20/2013 03:55 PM, Rik van Riel wrote:
This series makes the sysv semaphore code more scalable,
by reducing the time the semaphore lock is held, and making
the locking more scalable for semaphore arrays with multiple
semaphores.

Hi Rik,

Another issue that came up is:

[ 96.347341] ================================================
[ 96.348085] [ BUG: lock held when returning to user space! ]
[ 96.348834] 3.9.0-rc4-next-20130326-sasha-00011-gbcb2313 #318 Tainted: G W
[ 96.360300] ------------------------------------------------
[ 96.361084] trinity-child9/7583 is leaving the kernel with locks still held!
[ 96.362019] 1 lock held by trinity-child9/7583:
[ 96.362610] #0: (rcu_read_lock){.+.+..}, at: [<ffffffff8192eafb>] SYSC_semtimedop+0x1fb/0xec0

It seems that we can leave semtimedop without releasing the rcu read lock.

I'm a bit confused by what's going on in semtimedop with regards to rcu read lock, it
seems that this behaviour is actually intentional?

rcu_read_lock();
sma = sem_obtain_object_check(ns, semid);
if (IS_ERR(sma)) {
if (un)
rcu_read_unlock();
error = PTR_ERR(sma);
goto out_free;
}

When I've looked at that it seems that not releasing the read lock was (very)
intentional.

This logic was from the original code, which I also found to be quite
confusing.

I wasn't getting this warning with the old code, so there was probably something
else that triggers this now.


After that, the only code path that would release the lock starts with:

if (un) {
...

So we won't release the lock at all if un is NULL?


Not necessarily, we do release everything at the end of the function:

out_unlock_free:
sem_unlock(sma, locknum);

Ow, there's a rcu_read_unlock() in sem_unlock()? This complicates things even
more I suspect. If un is non-NULL we'll be unlocking rcu lock twice?

It is uglier than you think...

On success, find_alloc_undo will call rcu_read_lock, so we have the
rcu_read_lock held twice :(

Some of the ipc code is quite ugly, but making too many large changes
at once is just asking for trouble. I suspect we're going to have to
untangle this one bit at a time...


--
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/