Re: Widespread crashes in -next, bisected to 'mm: drop HASH_ADAPT'

From: Guenter Roeck
Date: Mon May 22 2017 - 05:03:30 EST


On 05/22/2017 01:45 AM, Michal Hocko wrote:
On Sat 20-05-17 09:26:34, Michal Hocko wrote:
On Fri 19-05-17 09:46:23, Guenter Roeck wrote:
Hi,

my qemu tests of next-20170519 show the following results:
total: 122 pass: 30 fail: 92

I won't bother listing all of the failures; they are available at
http://kerneltests.org/builders. I bisected one (openrisc, because
it gives me some console output before dying). It points to
'mm: drop HASH_ADAPT' as the culprit. Bisect log is attached.

A quick glance suggests that 64 bit kernels pass and 32 bit kernels fail.
32-bit x86 images fail and should provide an easy test case.

Hmm, this is quite unexpected as the patch is not supposed to change
things much. It just removes the flag and perform the new hash scaling
automatically for all requeusts which do not have any high limit.
Some of those didn't have HASH_ADAPT before but that shouldn't change
the picture much. The only thing that I can imagine is that what
formerly failed for early memblock allocations is now suceeding and that
depletes the early memory. Do you have any serial console from the boot?

OK, I guess I know what it going on here. Adaptive has scaling is not
really suited for 32b. ADAPT_SCALE_BASE is just too large for the word
size and so we end up in the endless loop. So the issue has been
introduced by the original "mm: adaptive hash table scaling" but my
patch made it more visible because [di]cache has tables most probably
suceeded in the early initialization which didn't have HASH_ADAPT.
The following should fix the hang. I am not yet sure about the maximum
size for the scaling and something even smaller would make sense to me
because kernel address space is just too small for such a large has
tables.
---
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a26e19c3e1ff..70c5fc1fb89a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7174,11 +7174,15 @@ static unsigned long __init arch_reserved_kernel_pages(void)
/*
* Adaptive scale is meant to reduce sizes of hash tables on large memory
* machines. As memory size is increased the scale is also increased but at
- * slower pace. Starting from ADAPT_SCALE_BASE (64G), every time memory
- * quadruples the scale is increased by one, which means the size of hash table
- * only doubles, instead of quadrupling as well.
+ * slower pace. Starting from ADAPT_SCALE_BASE (64G on 64b systems and 32M
+ * on 32b), every time memory quadruples the scale is increased by one, which
+ * means the size of hash table only doubles, instead of quadrupling as well.
*/
+#if __BITS_PER_LONG == 64
#define ADAPT_SCALE_BASE (64ul << 30)
+#else
+#define ADAPT_SCALE_BASE (32ul << 20)
+#endif
#define ADAPT_SCALE_SHIFT 2
#define ADAPT_SCALE_NPAGES (ADAPT_SCALE_BASE >> PAGE_SHIFT)

I have seen another patch making it 64ull. Not sure if adaptive scaling
on 32 bit systems really makes sense; unless there is a clear need I'd rather
leave it alone.

Guenter