Re: [GIT PULL rcu/next] rcu commits for 2.6.40

From: Yinghai Lu
Date: Fri May 13 2011 - 17:08:28 EST


On Thu, May 12, 2011 at 2:36 PM, Yinghai Lu <yinghai@xxxxxxxxxx> wrote:
> On 05/12/2011 02:20 AM, Paul E. McKenney wrote:
>> On Thu, May 12, 2011 at 12:42:50AM -0700, Yinghai Lu wrote:
>>> On 05/12/2011 12:27 AM, Yinghai Lu wrote:
>>>> On 05/11/2011 11:03 PM, Ingo Molnar wrote:
>>>>>
>>>>> * Yinghai Lu <yinghai@xxxxxxxxxx> wrote:
>>>>>
>>>>>> e59fb3120becfb36b22ddb8bd27d065d3cdca499 is the first bad commit
>>>>>> commit e59fb3120becfb36b22ddb8bd27d065d3cdca499
>>>>>> Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
>>>>>> Date:   Tue Sep 7 10:38:22 2010 -0700
>>>>>>
>>>>>>     rcu: Decrease memory-barrier usage based on semi-formal proof
>>>>>
>>>>> Find below an (untested!) attempt at reverting it for debugging purposes: could
>>>>> you please try it, does your system now boot up fine?
>>>>>
>>>>> Thanks,
>>>>>
>>>>>    Ingo
>>>>>
>>>>
>>>> yes, reverted manually that commit fix the problem.
>>>
>>> on system with 8 sockets westmere-ex
>>>
>>> it seems other commits after that commit contribute some delay too.
>>>
>>> [   32.240739] cpu_dev_init done
>>> [   73.587288] memory_dev_init done
>>
>> I am testing a revert of e59fb3120becfb36b22ddb8bd27d065d3cdca499 and
>> will chase down the delay.
>>
>
> it seems still need to revert following one in addition  e59fb3120becfb36b22ddb8bd27d065d3cdca499.
>
> [root@mpk14-2404-239-158 linux-2.6]# git bisect good
> a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit
> commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114
> Author: Paul E. McKenney <paul.mckenney@xxxxxxxxxx>
> Date:   Wed Jan 12 14:10:23 2011 -0800
>
>    rcu: move TREE_RCU from softirq to kthread
>
>    If RCU priority boosting is to be meaningful, callback invocation must
>    be boosted in addition to preempted RCU readers.  Otherwise, in presence
>    of CPU real-time threads, the grace period ends, but the callbacks don't
>    get invoked.  If the callbacks don't get invoked, the associated memory
>    doesn't get freed, so the system is still subject to OOM.
>
>    But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit
>    moves the callback invocations to a kthread, which can be boosted easily.
>
>    Also add comments and properly synchronized all accesses to
>    rcu_cpu_kthread_task, as suggested by Lai Jiangshan.
>
>    Signed-off-by: Paul E. McKenney <paul.mckenney@xxxxxxxxxx>
>    Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
>    Reviewed-by: Josh Triplett <josh@xxxxxxxxxxxxxxxx>
>
> :040000 040000 e40306ac6405952c1d387325a98588442209abe8 efe9ea2f408c62daaccf49e6d1339dff3a74f049 M      Documentation
> :040000 040000 8f9e7a8fa3a728d4ae58e2efb8ada7cf08aed00e 9b44deba45ba905c5d9b3cc314812f0ba3f7e639 M      include
> :040000 040000 4b10b719a2d56ed4bc796a9f43775732bb5ff144 4db269277ccf607e1a6a7d7f4c2a7cf8d592d46a M      kernel
> :040000 040000 881f102e6831381beed016ed240d690f6a2ccd5e 57d2fc6f84e47394c116bc617a9a0ef9b8b6dbd4 M      tools

so only revert e59fb3120becfb36b22ddb8bd27d065d3cdca499 is not enough.

[ 315.248277] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[ 315.285642] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[ 427.405283] INFO: rcu_sched_state detected stalls on CPUs/tasks: {
0} (detected by 50, t=15002 jiffies)
[ 427.408267] sending NMI to all CPUs:
[ 427.419298] NMI backtrace for cpu 1
[ 427.420616] CPU 1

Paul, can you make one clean revert for
| a26ac2455ffcf3be5c6ef92bc6df7182700f2114
| rcu: move TREE_RCU from softirq to kthread

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/