Re: [PATCH 3/3] RAS/CEC: immediate soft-offline page when count_threshold == 1

From: Borislav Petkov
Date: Sat Apr 20 2019 - 07:57:27 EST


On Thu, Apr 18, 2019 at 11:41:15AM +0800, WANG Chao wrote:
> count_threshol == 1 isn't working as expected. CEC only does soft
> offline the second time the same pfn is hit by a correctable error.

So this?

---
diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index b3c377ddf340..750a427e1a73 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -333,6 +333,7 @@ int cec_add_elem(u64 pfn)

mutex_lock(&ce_mutex);

+ /* Array full, free the LRU slot. */
if (ca->n == MAX_ELEMS)
WARN_ON(!del_lru_elem_unlocked(ca));

@@ -346,14 +347,9 @@ int cec_add_elem(u64 pfn)
(void *)&ca->array[to],
(ca->n - to) * sizeof(u64));

- ca->array[to] = (pfn << PAGE_SHIFT) |
- (DECAY_MASK << COUNT_BITS) | 1;
+ ca->array[to] = (pfn << PAGE_SHIFT) | 1;

ca->n++;
-
- ret = 0;
-
- goto decay;
}

count = COUNT(ca->array[to]);
@@ -386,7 +382,6 @@ int cec_add_elem(u64 pfn)
goto unlock;
}

-decay:
ca->decay_count++;

if (ca->decay_count >= CLEAN_ELEMS)

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.