Re: REGRESSION: Performance regressions from switching anon_vma->lockto mutex

From: Linus Torvalds
Date: Wed Jun 15 2011 - 21:50:54 EST


On Wed, Jun 15, 2011 at 2:37 PM, Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> wrote:
>
> http://programming.kicks-ass.net/sekrit/39-2.txt.bz2
> http://programming.kicks-ass.net/sekrit/tip-2.txt.bz2
>
> tip+sirq+linus is still slightly faster than .39 here,

Hmm. Your profile doesn't show the mutex slowpath at all, so there's a
big difference to the one Tim quoted parts of.

In fact, your profile looks fine. The load clearly spends tons of time
in page faulting and in timing things (that read_hpet thing is
disgusting), but with that in mind, the profile doesn't look scary.
Yes, the 2% spinlock time is bad, but you've clearly not hit the real
lock contention case. The mutex lock shows up, but _way_ below the
spinlock, and the slowpath never shows at all. You end up having
mutex_spin_on_owner at 0.09%, it's not really visible.

Clearly going from your two-socket 12-core thing to Tim's four-socket
40-core case is a big jump. But maybe it really was about RCU, and
even the limited softirq patch that moves the grace period stuff etc
back to softirqs ends up helping.

Tim, have you tried running your bigger load with that patch? You
could try my patch on top too just to match Peter's tree, but I doubt
that's the big first-order issue.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/