[PATCH] mm, page_alloc: Reset zonelist iterator after resetting fair zone allocation policy

From: Mel Gorman
Date: Tue May 31 2016 - 06:08:58 EST


Geert Uytterhoeven reported the following problem that bisected to commit
c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a
zonelist twice") on m68k/ARAnyM

BUG: scheduling while atomic: cron/668/0x10c9a0c0
Modules linked in:
CPU: 0 PID: 668 Comm: cron Not tainted 4.6.0-atari-05133-gc33d6c06f60f710f #364
Stack from 10c9a074:
10c9a074 003763ca 0003d7d0 00361a58 00bcf834 0000029c 10c9a0c0 10c9a0c0
002f0f42 00bcf5e0 00000000 00000082 0048e018 00000000 00000000 002f0c30
000410de 00000000 00000000 10c9a0e0 002f112c 00000000 7fffffff 10c9a180
003b1490 00bcf60c 10c9a1f0 10c9a118 002f2d30 00000000 10c9a174 10c9a180
0003ef56 003b1490 00bcf60c 003b1490 00bcf60c 0003eff6 003b1490 00bcf60c
003b1490 10c9a128 002f118e 7fffffff 00000082 002f1612 002f1624 7fffffff
Call Trace: [<0003d7d0>] __schedule_bug+0x40/0x54
[<002f0f42>] __schedule+0x312/0x388
[<002f0c30>] __schedule+0x0/0x388
[<000410de>] prepare_to_wait+0x0/0x52
[<002f112c>] schedule+0x64/0x82
[<002f2d30>] schedule_timeout+0xda/0x104
[<0003ef56>] set_next_entity+0x18/0x40
[<0003eff6>] pick_next_task_fair+0x78/0xda
[<002f118e>] io_schedule_timeout+0x36/0x4a
[<002f1612>] bit_wait_io+0x0/0x40
[<002f1624>] bit_wait_io+0x12/0x40
[<002f12c4>] __wait_on_bit+0x46/0x76
[<0006a06a>] wait_on_page_bit_killable+0x64/0x6c
[<002f1612>] bit_wait_io+0x0/0x40
[<000411fe>] wake_bit_function+0x0/0x4e
[<0006a1b8>] __lock_page_or_retry+0xde/0x124
[<00217000>] do_scan_async+0x114/0x17c
[<00098856>] lookup_swap_cache+0x24/0x4e
[<0008b7c8>] handle_mm_fault+0x626/0x7de
[<0008ef46>] find_vma+0x0/0x66
[<002f2612>] down_read+0x0/0xe
[<0006a001>] wait_on_page_bit_killable_timeout+0x77/0x7c
[<0008ef5c>] find_vma+0x16/0x66
[<00006b44>] do_page_fault+0xe6/0x23a
[<0000c350>] res_func+0xa3c/0x141a
[<00005bb8>] buserr_c+0x190/0x6d4
[<0000c350>] res_func+0xa3c/0x141a
[<000028ec>] buserr+0x20/0x28
[<0000c350>] res_func+0xa3c/0x141a
[<000028ec>] buserr+0x20/0x28

The relationship is not obvious but it's due to a failure to rescan the full
zonelist after the fair zone allocation policy exhausts the batch count. While
this is a functional problem, it's also a performance issue. A page allocator
microbenchmark showed the following

4.7.0-rc1 4.7.0-rc1
vanilla reset-v1r2
Min alloc-odr0-1 327.00 ( 0.00%) 326.00 ( 0.31%)
Min alloc-odr0-2 235.00 ( 0.00%) 235.00 ( 0.00%)
Min alloc-odr0-4 198.00 ( 0.00%) 198.00 ( 0.00%)
Min alloc-odr0-8 170.00 ( 0.00%) 170.00 ( 0.00%)
Min alloc-odr0-16 156.00 ( 0.00%) 156.00 ( 0.00%)
Min alloc-odr0-32 150.00 ( 0.00%) 150.00 ( 0.00%)
Min alloc-odr0-64 146.00 ( 0.00%) 146.00 ( 0.00%)
Min alloc-odr0-128 145.00 ( 0.00%) 145.00 ( 0.00%)
Min alloc-odr0-256 155.00 ( 0.00%) 155.00 ( 0.00%)
Min alloc-odr0-512 168.00 ( 0.00%) 165.00 ( 1.79%)
Min alloc-odr0-1024 175.00 ( 0.00%) 174.00 ( 0.57%)
Min alloc-odr0-2048 180.00 ( 0.00%) 180.00 ( 0.00%)
Min alloc-odr0-4096 187.00 ( 0.00%) 186.00 ( 0.53%)
Min alloc-odr0-8192 190.00 ( 0.00%) 190.00 ( 0.00%)
Min alloc-odr0-16384 191.00 ( 0.00%) 191.00 ( 0.00%)
Min alloc-odr1-1 736.00 ( 0.00%) 445.00 ( 39.54%)
Min alloc-odr1-2 343.00 ( 0.00%) 335.00 ( 2.33%)
Min alloc-odr1-4 277.00 ( 0.00%) 270.00 ( 2.53%)
Min alloc-odr1-8 238.00 ( 0.00%) 233.00 ( 2.10%)
Min alloc-odr1-16 224.00 ( 0.00%) 218.00 ( 2.68%)
Min alloc-odr1-32 210.00 ( 0.00%) 208.00 ( 0.95%)
Min alloc-odr1-64 207.00 ( 0.00%) 203.00 ( 1.93%)
Min alloc-odr1-128 276.00 ( 0.00%) 202.00 ( 26.81%)
Min alloc-odr1-256 206.00 ( 0.00%) 202.00 ( 1.94%)
Min alloc-odr1-512 207.00 ( 0.00%) 202.00 ( 2.42%)
Min alloc-odr1-1024 208.00 ( 0.00%) 205.00 ( 1.44%)
Min alloc-odr1-2048 213.00 ( 0.00%) 212.00 ( 0.47%)
Min alloc-odr1-4096 218.00 ( 0.00%) 216.00 ( 0.92%)
Min alloc-odr1-8192 341.00 ( 0.00%) 219.00 ( 35.78%)
.
Note that order-0 allocations are unaffected but higher orders get
a small boost from this patch and a large reduction in system CPU
usage overall as can be seen here

4.7.0-rc1 4.7.0-rc1
vanilla reset-v1r2
User 85.32 86.31
System 2221.39 2053.36
Elapsed 2368.89 2202.47

Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
Reported-and-tested-by: Geert Uytterhoeven <geert@xxxxxxxxxxxxxx>
Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
---
mm/page_alloc.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bb320cde4d6d..557549c81083 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3024,6 +3024,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
apply_fair = false;
fair_skipped = false;
reset_alloc_batches(ac->preferred_zoneref->zone);
+ z = ac->preferred_zoneref;
goto zonelist_scan;
}