[PATCH] list_lru: Prefetch neighboring list entries before acquiring lock

From: Waiman Long
Date: Wed Nov 29 2017 - 09:17:56 EST


The list_lru_del() function removes the given item from the LRU list.
The operation looks simple, but it involves writing into the cachelines
of the two neighboring list entries in order to get the deletion done.
That can take a while if the cachelines aren't there yet, thus
prolonging the lock hold time.

To reduce the lock hold time, the cachelines of the two neighboring
list entries are now prefetched before acquiring the list_lru_node's
lock.

Using a multi-threaded test program that created a large number
of dentries and then killed them, the execution time was reduced
from 38.5s to 36.6s after applying the patch on a 2-socket 36-core
72-thread x86-64 system.

Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
---
mm/list_lru.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/list_lru.c b/mm/list_lru.c
index f141f0c..65aae44 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -132,8 +132,16 @@ bool list_lru_del(struct list_lru *lru, struct list_head *item)
struct list_lru_node *nlru = &lru->node[nid];
struct list_lru_one *l;

+ /*
+ * Prefetch the neighboring list entries to reduce lock hold time.
+ */
+ if (unlikely(list_empty(item)))
+ return false;
+ prefetchw(item->prev);
+ prefetchw(item->next);
+
spin_lock(&nlru->lock);
- if (!list_empty(item)) {
+ if (likely(!list_empty(item))) {
l = list_lru_from_kmem(nlru, item);
list_del_init(item);
l->nr_items--;
--
1.8.3.1