Under very heavy multi-process workloads, like AIM7, the VM can
get into trouble in a variety of ways. The trouble start when
there are hundreds, or even thousands of processes active in the
page reclaim code.
Not only can the system suffer enormous slowdowns because of
lock contention (and conditional reschedules) between thousands
of processes in the page reclaim code, but each process will try
to free up to SWAP_CLUSTER_MAX pages, even when the system already
has lots of memory free.
It should be possible to avoid both of those issues at once, by
simply limiting how many processes are active in the page reclaim
code simultaneously.
If too many processes are active doing page reclaim in one zone,
simply go to sleep in shrink_zone().
On wakeup, check whether enough memory has been freed already
before jumping into the page reclaim code ourselves. We want
to use the same threshold here that is used in the page allocator
for deciding whether or not to call the page reclaim code in the
first place, otherwise some unlucky processes could end up freeing
memory for the rest of the system.
Control how to kill processes when uncorrected memory error (typically
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 30fe668..ed614b8 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -345,6 +345,10 @@ struct zone {
/* Zone statistics */
atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS];
+ /* Number of processes running page reclaim code on this zone. */
+ atomic_t concurrent_reclaimers;
+ wait_queue_head_t reclaim_wait;
+