Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time

From: Borislav Petkov
Date: Mon Apr 22 2019 - 13:16:04 EST


On Mon, Apr 22, 2019 at 03:59:16PM +0000, Luck, Tony wrote:
> > Err, this all sounds to me like the storm detection code should
> > *automatically* disable the CEC in such cases, I'd say.
>
> Sounds good. But we should distinguish storms that have many different
> addresses from storms that just ping a few addresses. CEC will see counts
> hit the threshold in the latter case, but it might not be able to take the pages
> offline (because they are locked, or in-use by kernel).
>
> So I think the change might be to the return value from NOTIFY_STOP to NOTIFY_DONE
> ... but only if we are in the middle of a storm AND the CEC array is full.

Well, regardless of this specific use case, isn't that a generic enough
action that we should do always? I mean, the aspect of falling back to
logging to external agent.

However, currently we don't signal that the CEC is full - we simply
remove the LRU element in cec_add_elem() before we insert the new one.

We can either return a specific retval to say, CEC is full and we had to
delete an elem or we can add a cec_is_full() accessor...

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.