Re: [RFC PATCH] introduce sys_membarrier(): process-wide memorybarrier

From: Paul E. McKenney
Date: Fri Jan 08 2010 - 20:22:47 EST


On Fri, Jan 08, 2010 at 05:21:28PM -0800, Paul E. McKenney wrote:
> On Fri, Jan 08, 2010 at 08:02:31PM -0500, Mathieu Desnoyers wrote:
> > * Paul E. McKenney (paulmck@xxxxxxxxxxxxxxxxxx) wrote:
> > > On Fri, Jan 08, 2010 at 06:53:38PM -0500, Mathieu Desnoyers wrote:
> > > > * Steven Rostedt (rostedt@xxxxxxxxxxx) wrote:
> > > > > Well, if we just grab the task_rq(task)->lock here, then we should be
> > > > > OK? We would guarantee that curr is either the task we want or not.
> > > >
> > > > Hrm, I just tested it, and there seems to be a significant performance
> > > > penality involved with taking these locks for each CPU, even with just 8
> > > > cores. So if we can do without the locks, that would be preferred.
> > >
> > > How significant? Factor of two? Two orders of magnitude?
> > >
> >
> > On a 8-core Intel Xeon (T is the number of threads receiving the IPIs):
> >
> > Without runqueue locks:
> >
> > T=1: 0m13.911s
> > T=2: 0m20.730s
> > T=3: 0m21.474s
> > T=4: 0m27.952s
> > T=5: 0m26.286s
> > T=6: 0m27.855s
> > T=7: 0m29.695s
> >
> > With runqueue locks:
> >
> > T=1: 0m15.802s
> > T=2: 0m22.484s
> > T=3: 0m24.751s
> > T=4: 0m29.134s
> > T=5: 0m30.094s
> > T=6: 0m33.090s
> > T=7: 0m33.897s
> >
> > So on 8 cores, taking spinlocks for each of the 8 runqueues adds about
> > 15% overhead when doing an IPI to 1 thread. Therefore, that won't be
> > pretty on 128+-core machines.
>
> But isn't the bulk of the overhead the IPIs rather than the runqueue
> locks?
>
> W/out RQ W/RQ % degradation
> T=1: 13.91 15.8 1.14
> T=2: 20.73 22.48 1.08
> T=3: 21.47 24.75 1.15
> T=4: 27.95 29.13 1.04
> T=5: 26.29 30.09 1.14
> T=6: 27.86 33.09 1.19
> T=7: 29.7 33.9 1.14

Right... s/% degradation/Ratio/ :-/

Thanx, Paul

> So if we had lots of CPUs, we might want to fan the IPIs out through
> intermediate CPUs in a tree fashion, but the runqueue locks are not
> causing excessive pain.
>
> How does this compare to use of POSIX signals? Never mind, POSIX
> signals are arbitrarily bad if you have way more threads than are
> actually running at the time...
>
> Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/