Re: [PATCH RFC] rcu: Limit GP initialization to CPUs that have beenonline

From: Paul E. McKenney
Date: Fri Mar 16 2012 - 13:22:33 EST


On Fri, Mar 16, 2012 at 10:46:23AM -0500, Dimitri Sivanich wrote:
> On Thu, Mar 15, 2012 at 02:07:53PM -0700, Paul E. McKenney wrote:
> > On Thu, Mar 15, 2012 at 11:23:14AM -0700, Paul E. McKenney wrote:
> > > On Thu, Mar 15, 2012 at 12:58:57PM -0500, Dimitri Sivanich wrote:
> > > > On Wed, Mar 14, 2012 at 09:56:57AM -0700, Paul E. McKenney wrote:
> > > > > On Wed, Mar 14, 2012 at 08:17:17AM -0700, Paul E. McKenney wrote:
> > > > > > On Wed, Mar 14, 2012 at 08:08:01AM -0500, Dimitri Sivanich wrote:
> > > > > > > On Wed, Mar 14, 2012 at 01:40:41PM +0100, Mike Galbraith wrote:
> > > > > > > > On Wed, 2012-03-14 at 10:24 +0100, Mike Galbraith wrote:
> > > > > > > > > On Tue, 2012-03-13 at 17:24 -0700, Paul E. McKenney wrote:
> > > > > > > > > > The following builds, but is only very lightly tested. Probably full
> > > > > > > > > > of bug, especially when exercising CPU hotplug.
> > > > > > > > >
> > > > > > > > > You didn't say RFT, but...
> > > > > > > > >
> > > > > > > > > To beat on this in a rotund 3.0 kernel, the equivalent patch would be
> > > > > > > > > the below? My box may well answer that before you can.. hope not ;-)
> > > > > > > >
> > > > > > > > (Darn, it did. Box says boot stall with virgin patch in tip too though.
> > > > > > > > Wedging it straight into 3.0 was perhaps a tad premature;)
> > > > > > >
> > > > > > > I saw the same thing with 3.3.0-rc7+ and virgin patch on UV. Boots fine without the patch.
> > > > > >
> > > > > > Right... Bozo here forgot to set the kernel parameters for large-system
> > > > > > emulation during testing. Apologies for the busted patch, will fix.
> > > > > >
> > > > > > And thank you both for the testing!!!
> > > > > >
> > > > > > Hey, at least I labeled it "RFC". ;-)
> > > > >
> > > > > Does the following work better? It does pass my fake-big-system tests
> > > > > (more testing in the works).
> > > >
> > > > This one stalls for me at the same place the other one did. Once again,
> > > > if I remove the patch and rebuild, it boots just fine.
> > > >
> > > > Is there some debug/trace information that you would like me to provide?
> > >
> > > Very strange.
> > >
> > > Could you please send your dmesg and .config?
> >
> > Hmmm... Memory ordering could be a problem, though in that case I would
> > have expected the hand during the onlining process. However, the memory
> > ordering does need to be cleaned up in any case, please see below.
> >
> After testing this on 3.3.0-rc7+ I can say that this very much improves the
> latency in the two rcu_for_each_node_breadth_first() loops.
>
> Without the patch, under moderate load and while running an interrupt latency
> test, I see the majority of loops taking 100-200 usec.
>
> With the patch there are a few that take between 20-30, the rest are below
> that.
>
> Not that everything is OK latency-wise in RCU land. There is still an
> interrupt holdoff in force_quiescent_state() that is taking > 100usec,
> with or without the patch. I'm having difficulty finding exactly where
> the other holdoff is happening because the kernel isn't accepting my nmi
> handler.

Please see my subsequent patch for force_quiescent_state(). ;-)

It is at https://lkml.org/lkml/2012/3/16/286 in case you missed it.

> That said, this fix is a nice improvement in those two loops.

Glad it helps, I have documented the improvement in the commit message.

Thank you both again for your testing efforts!

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/