Re: ipc,sem: sysv semaphore scalability

From: Linus Torvalds
Date: Sun Mar 31 2013 - 13:11:22 EST

Next message: Vinod Koul: "[GIT PULL] slave-dmaengine fixes"
Previous message: Dr Mark Harrison: "FrÃn Dr Mark Harrison"
In reply to: Rik van Riel: "Re: ipc,sem: sysv semaphore scalability"
Next in thread: Emmanuel Benisty: "Re: ipc,sem: sysv semaphore scalability"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sun, Mar 31, 2013 at 6:45 AM, Rik van Riel <riel@xxxxxxxxxxx> wrote:
>
> Should we use "semid" here, like Linus suggested, instead of "un->semid"?

As Davidlohr noted, in linux-next the rcu read-lock is held over the
whole thing, so no, un->semid should be stable once "un" has been
re-looked-up under the semaphore lock.

In mainline, the problem is that the "sem_lock_check()" is done with
"un->semid" *after* we've dropped the RCU read-lock, so "un" at that
point is not reliable (it could be free'd at any time underneath us).

That said, I really *really* hate what both mainline and linux-next do
with the RCU read lock, and linux-next is arguably worse.

The whole "take the RCU lock in one place, and release it in another"
is confusing and bug-prone as hell. And linux-next made it worse: now
sem_lock() no longer takes the read-lock (it expects the caller to
take it), but sem_unlock() still drops the read-lock. This is all just
f*cking crazy.

The rule should be that the rcu read-lock is always and released at
the same "level". For example, find_alloc_undo() should just be called
with (and unconditionaly return with) the rcu read-lock held, and if
it needs to actually do an allocation, it can drop the rcu lock for
the duration of the allocation.

This whole "conditional locking" depending on error returns and on
whether we have undo's etc is bug-prone and confusing. And when you
have totally different locking rules for "sem_lock()" vs
"sem_unlock()", you know you're confused.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Vinod Koul: "[GIT PULL] slave-dmaengine fixes"
Previous message: Dr Mark Harrison: "FrÃn Dr Mark Harrison"
In reply to: Rik van Riel: "Re: ipc,sem: sysv semaphore scalability"
Next in thread: Emmanuel Benisty: "Re: ipc,sem: sysv semaphore scalability"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]