Re: [GIT PULL rcu/next] rcu commits for 2.6.40

From: Paul E. McKenney
Date: Sat May 14 2011 - 10:26:31 EST


On Fri, May 13, 2011 at 02:08:21PM -0700, Yinghai Lu wrote:
> On Thu, May 12, 2011 at 2:36 PM, Yinghai Lu <yinghai@xxxxxxxxxx> wrote:
> > On 05/12/2011 02:20 AM, Paul E. McKenney wrote:
> >> On Thu, May 12, 2011 at 12:42:50AM -0700, Yinghai Lu wrote:
> >>> On 05/12/2011 12:27 AM, Yinghai Lu wrote:
> >>>> On 05/11/2011 11:03 PM, Ingo Molnar wrote:
> >>>>>
> >>>>> * Yinghai Lu <yinghai@xxxxxxxxxx> wrote:
> >>>>>
> >>>>>> e59fb3120becfb36b22ddb8bd27d065d3cdca499 is the first bad commit
> >>>>>> commit e59fb3120becfb36b22ddb8bd27d065d3cdca499
> >>>>>> Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> >>>>>> Date:   Tue Sep 7 10:38:22 2010 -0700
> >>>>>>
> >>>>>>     rcu: Decrease memory-barrier usage based on semi-formal proof
> >>>>>
> >>>>> Find below an (untested!) attempt at reverting it for debugging purposes: could
> >>>>> you please try it, does your system now boot up fine?
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>>    Ingo
> >>>>>
> >>>>
> >>>> yes, reverted manually that commit fix the problem.
> >>>
> >>> on system with 8 sockets westmere-ex
> >>>
> >>> it seems other commits after that commit contribute some delay too.
> >>>
> >>> [   32.240739] cpu_dev_init done
> >>> [   73.587288] memory_dev_init done
> >>
> >> I am testing a revert of e59fb3120becfb36b22ddb8bd27d065d3cdca499 and
> >> will chase down the delay.
> >>
> >
> > it seems still need to revert following one in addition  e59fb3120becfb36b22ddb8bd27d065d3cdca499.
> >
> > [root@mpk14-2404-239-158 linux-2.6]# git bisect good
> > a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit
> > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114
> > Author: Paul E. McKenney <paul.mckenney@xxxxxxxxxx>
> > Date:   Wed Jan 12 14:10:23 2011 -0800
> >
> >    rcu: move TREE_RCU from softirq to kthread
> >
> >    If RCU priority boosting is to be meaningful, callback invocation must
> >    be boosted in addition to preempted RCU readers.  Otherwise, in presence
> >    of CPU real-time threads, the grace period ends, but the callbacks don't
> >    get invoked.  If the callbacks don't get invoked, the associated memory
> >    doesn't get freed, so the system is still subject to OOM.
> >
> >    But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit
> >    moves the callback invocations to a kthread, which can be boosted easily.
> >
> >    Also add comments and properly synchronized all accesses to
> >    rcu_cpu_kthread_task, as suggested by Lai Jiangshan.
> >
> >    Signed-off-by: Paul E. McKenney <paul.mckenney@xxxxxxxxxx>
> >    Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> >    Reviewed-by: Josh Triplett <josh@xxxxxxxxxxxxxxxx>
> >
> > :040000 040000 e40306ac6405952c1d387325a98588442209abe8 efe9ea2f408c62daaccf49e6d1339dff3a74f049 M      Documentation
> > :040000 040000 8f9e7a8fa3a728d4ae58e2efb8ada7cf08aed00e 9b44deba45ba905c5d9b3cc314812f0ba3f7e639 M      include
> > :040000 040000 4b10b719a2d56ed4bc796a9f43775732bb5ff144 4db269277ccf607e1a6a7d7f4c2a7cf8d592d46a M      kernel
> > :040000 040000 881f102e6831381beed016ed240d690f6a2ccd5e 57d2fc6f84e47394c116bc617a9a0ef9b8b6dbd4 M      tools
>
> so only revert e59fb3120becfb36b22ddb8bd27d065d3cdca499 is not enough.
>
> [ 315.248277] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
> [ 315.285642] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> [ 427.405283] INFO: rcu_sched_state detected stalls on CPUs/tasks: {
> 0} (detected by 50, t=15002 jiffies)
> [ 427.408267] sending NMI to all CPUs:
> [ 427.419298] NMI backtrace for cpu 1
> [ 427.420616] CPU 1
>
> Paul, can you make one clean revert for
> | a26ac2455ffcf3be5c6ef92bc6df7182700f2114
> | rcu: move TREE_RCU from softirq to kthread

I will be continuing to look into a few things over the weekend, but
if I cannot find the cause, then changing back to softirq might be the
thing to do. It won't be so much a revert in the "git revert" sense
due to later dependencies, but it could be shifted back from kthread
to softirq. This would certainly decrease dependence on the scheduler,
at least in the common case where ksoftirqd does not run.

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/