Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if theyexist

From: Mel Gorman
Date: Fri Jan 06 2012 - 05:47:12 EST


On Fri, Jan 06, 2012 at 11:36:11AM +0530, Srivatsa S. Bhat wrote:
> On 01/06/2012 03:51 AM, Mel Gorman wrote:
>
> > (Adding Greg to cc to see if he recalls seeing issues with sysfs dentry
> > suffering from recursive locking recently)
> >
> > On Thu, Jan 05, 2012 at 10:35:04AM -0800, Paul E. McKenney wrote:
> >> On Thu, Jan 05, 2012 at 04:35:29PM +0000, Russell King - ARM Linux wrote:
> >>> On Thu, Jan 05, 2012 at 04:17:39PM +0000, Mel Gorman wrote:
> >>>> Link please?
> >>>
> >>> Forwarded, as its still in my mailbox.
> >>>
> >>>> I'm including a patch below under development that is
> >>>> intended to only cope with the page allocator case under heavy memory
> >>>> pressure. Currently it does not pass testing because eventually RCU
> >>>> gets stalled with the following trace
> >>>>
> >>>> [ 1817.176001] [<ffffffff810214d7>] arch_trigger_all_cpu_backtrace+0x87/0xa0
> >>>> [ 1817.176001] [<ffffffff810c4779>] __rcu_pending+0x149/0x260
> >>>> [ 1817.176001] [<ffffffff810c48ef>] rcu_check_callbacks+0x5f/0x110
> >>>> [ 1817.176001] [<ffffffff81068d7f>] update_process_times+0x3f/0x80
> >>>> [ 1817.176001] [<ffffffff8108c4eb>] tick_sched_timer+0x5b/0xc0
> >>>> [ 1817.176001] [<ffffffff8107f28e>] __run_hrtimer+0xbe/0x1a0
> >>>> [ 1817.176001] [<ffffffff8107f581>] hrtimer_interrupt+0xc1/0x1e0
> >>>> [ 1817.176001] [<ffffffff81020ef3>] smp_apic_timer_interrupt+0x63/0xa0
> >>>> [ 1817.176001] [<ffffffff81449073>] apic_timer_interrupt+0x13/0x20
> >>>> [ 1817.176001] [<ffffffff8116c135>] vfsmount_lock_local_lock+0x25/0x30
> >>>> [ 1817.176001] [<ffffffff8115c855>] path_init+0x2d5/0x370
> >>>> [ 1817.176001] [<ffffffff8115eecd>] path_lookupat+0x2d/0x620
> >>>> [ 1817.176001] [<ffffffff8115f4ef>] do_path_lookup+0x2f/0xd0
> >>>> [ 1817.176001] [<ffffffff811602af>] user_path_at_empty+0x9f/0xd0
> >>>> [ 1817.176001] [<ffffffff81154e7b>] vfs_fstatat+0x4b/0x90
> >>>> [ 1817.176001] [<ffffffff81154f4f>] sys_newlstat+0x1f/0x50
> >>>> [ 1817.176001] [<ffffffff81448692>] system_call_fastpath+0x16/0x1b
> >>>>
> >>>> It might be a separate bug, don't know for sure.
> >>
> >
> > I rebased the patch on top of 3.2 and tested again with a bunch of
> > debugging options set (PROVE_RCU, PROVE_LOCKING etc). Same results. CPU
> > hotplug is a lot more reliable and less likely to hang but eventually
> > gets into trouble.
> >
>
> I was running some CPU hotplug stress tests recently and found it to be
> problematic too. Mel, I have some logs from those tests which appear very
> relevant to the "IPI to offline CPU" issue that has been discussed in this
> thread.
>
> Kernel: 3.2-rc7
> Here is the log:
> (Unfortunately I couldn't capture the log intact, due to some annoying
> serial console issues, but I hope this log is good enough to analyze.)
>

Ok, it looks vaguely similar to what I'm seeing. I think I spotted
the sysfs problem as well and am testing a series. I'll add you to
the cc if it passes tests locally.

Thanks.

--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/