Re: [PATCH] x86/perf/intel/cqm: Get rid of the silly for_each_cpu lookups

From: Vikas Shivappa
Date: Thu Feb 18 2016 - 18:11:48 EST




On Thu, 18 Feb 2016, Thomas Gleixner wrote:

On Wed, 17 Feb 2016, Thomas Gleixner wrote:
On Wed, 17 Feb 2016, Vikas Shivappa wrote:

Please stop top posting, finally!

But we have an extra static - static to avoid having it in the stack..

It's not about the cpu mask on the stack. The reason was that with cpumask off
stack cpumask_and_mask() requires an allocation, which then can't be used in
the starting/dying callbacks.

Darn, you are right to remind me.

Now, the proper solution for this stuff is to provide a library function as we
need that for several drivers. No point to duplicate that functionality. I'll
cook something up and repost the uncore/cqm set tomorrow.

Second thoughts on that.

cpumask_any_but() is fine as is, if we feed it topology_core_cpumask(cpu). The
worst case search is two bitmap_find_next() if the first search returned cpu.

Now cpumask_any_and() does a search as well, but the number of
bitmap_find_next() invocations is limited to the number of sockets if we feed
the cqm_cpu_mask as first argument. So for 4 or 8 sockets that's still a
reasonable limit. If the people with insane large machines care, we can
revisit that topic. It's still faster than for_each_online_cpu() :)

Agree. if we dont care about the large number of sockets this would still be far better than scanning each cpu. There could be some branches we avoid if we are too aggressive and remove 'all' loops (the 2nd search is always a success if 1st one fails in cpumask_any_but)
by using the cpumask_and but they should not be much important/use in this case.

Will send rapl patch separately.

Thanks,
Vikas


Thanks,

tglx