Re: [PATCH 10/13] x86: mm: Enable deferred struct page initialisation on x86-64

From: Waiman Long
Date: Fri Apr 24 2015 - 15:04:38 EST


On 04/24/2015 11:20 AM, Mel Gorman wrote:
On Fri, Apr 24, 2015 at 10:35:49AM -0400, Waiman Long wrote:
On 04/23/2015 05:23 AM, Mel Gorman wrote:
On Wed, Apr 22, 2015 at 04:45:00PM -0700, Andrew Morton wrote:
On Wed, 22 Apr 2015 18:07:50 +0100 Mel Gorman<mgorman@xxxxxxx> wrote:

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -32,6 +32,7 @@ config X86
select HAVE_UNSTABLE_SCHED_CLOCK
select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
select ARCH_SUPPORTS_INT128 if X86_64
+ select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT if X86_64&& NUMA
Put this in the "config X86_64" section and skip the "X86_64&&"?

Done.

Can we omit the whole defer_meminit= thing and permanently enable the
feature? That's simpler, provides better test coverage and is, we
hope, faster.

Yes. The intent was to have a workaround if there were any failures like
Waiman's vmalloc failures in an earlier version but they are bugs that
should be fixed.

And can this be used on non-NUMA? Presumably that won't speed things
up any if we're bandwidth limited but again it's simpler and provides
better coverage.
Nothing prevents it. There is less opportunity for parallelism but
improving coverage is desirable.

Memory access latency can be more than double for local vs. remote
node memory. Bandwidth can also be much lower depending on what kind
of interconnect is between the 2 nodes. So it is better to do it in
a NUMA-aware way.
I do not believe that is what he was asking. He was asking if we could
defer memory initialisation even when there is only one node. It does not
gain much in terms of boot times but it improves testing coverage.

Thanks for the clarification.

Within a NUMA node, however, we can split the
memory initialization to 2 or more local CPUs if the memory size is
big enough.

I considered it but discarded the idea. It'd be more complex to setup and
the two CPUs could simply end up contending on the same memory bus as
well as contending on zone->lock.


I don't think we need that now. However, we may have to consider this when one day even a single node can have TBs of memory unless we move to a page size larger than 4k.

Cheers,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/