RE: rcu: performance regression

From: Shi, Alex
Date: Tue Jun 14 2011 - 09:08:25 EST




> -----Original Message-----
> From: Paul E. McKenney [mailto:paulmck@xxxxxxxxxxxxxxxxxx]
> Sent: Tuesday, June 14, 2011 9:02 PM
> To: Shi, Alex
> Cc: Li, Shaohua; Ingo Molnar; lkml; Chen, Tim C; peterz@xxxxxxxxxxxxx;
> rostedt@xxxxxxxxxxx
> Subject: Re: rcu: performance regression
>
> On Tue, Jun 14, 2011 at 04:33:08PM +0800, Alex,Shi wrote:
> > On Tue, 2011-06-14 at 13:26 +0800, Li, Shaohua wrote:
> > > Commit a26ac2455ffcf3(rcu: move TREE_RCU from softirq to kthread)
> > > introduced performance regression. In our AIM7 test, this commit caused
> > > about 40% regression.
> > > The commit runs rcu callbacks in a kthread instead of softirq. We
> > > observed high rate of context switch which is caused by this. Out test
> > > system has 64 CPUs and HZ is 1000, so we saw more than 64k context
> > > switch per second which is caused by the rcu thread.
> > > I also did trace and found when rcy thread is woken up, most time the
> > > thread doesn't handle any callbacks actually, it just initializes new gp
> > > or end one gp or similar.
> > > From my understanding, the purpose to make rcu runs in kthread is to
> > > speed up rcu callbacks run (with help of rtmutex PI), not for end gp and
> > > so on, which runs pretty fast actually and doesn't need boost.
> > > To verify my findings, I had below debug patch applied. It still handles
> > > rcu callbacks in kthread if there is any pending callbacks, but other
> > > things are still running in softirq. this completely solved our
> > > regression. I thought this can still boost callbacks run. but I'm not
> > > expert in the area, so please help.
> >
> > This commit also cause hackbench process mode performance dropping, and
> > Shaohua's patch do recovered this. But in hackbench testing, the vmstat
> > show context switch have some reduce. And perf tool show
> > root_domain->cpupri->prio_to_cpu[]->lock has contention with the commit.
>
> Steven, Peter, would any of the recent fixes address this lock contention?

There is only one global root_domain.cpupri, and when do wake_up_process for RT process, the cpupri_set will be called to hold a global lock.
Seems this contention isn't triggered before.
>
> Thanx, Paul
>
> > 11.53% hackbench [kernel] [k]
> > |
> > --- _raw_spin_lock_irqsave
> > cpupri_set
> > __enqueue_rt_entity
> > enqueue_rt_entity
> > enqueue_task_rt
> > enqueue_task
> > activate_task
> > ttwu_activate
> > ttwu_do_activate.clone.3
> > try_to_wake_up
> > wake_up_process
> > invoke_rcu_cpu_kthread
> > rcu_check_callbacks
> > update_process_times
> > tick_sched_timer
> > __run_hrtimer
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/