Re: [PATCH] RAS/CEC: Add debugfs switch to disable at run time

From: Cong Wang
Date: Sat Apr 20 2019 - 15:08:24 EST


On Sat, Apr 20, 2019 at 11:47 AM Borislav Petkov <bp@xxxxxxxxx> wrote:
> IOW, when you have the CEC enabled, you don't need to log memory errors
> with a userspace agent. The CEC collects them and discards them if they
> don't repeat.

So, you mean breaking mcelog is intentionally, if so, why not break it
loudly?

That is, for example, preventing mcelog from starting by disabling
CONFIG_X86_MCELOG_LEGACY in Kconfig _automatically_ when
CONFIG_RAS is enabled? (Like what I showed in my PoC change.)

Or, for another example, print a kernel warning and let users know this
behavior is intentional?


>
> If they do repeat, then it offlines the page.
>
> Without user intervention and interference.
>
> Now, if you still want to know how many errors and where they happened
> and when they happened and yadda yadda, you *disable* the CEC.

Well, I believe rasdaemon has the counters too, it is not hard to count
the trace events at all. I don't worry about this at all. What I worry is how
we treat mcelog when having CONFIG_RAS=y.

>
> I hope this makes more sense now.

Yes, thanks for the information. It is kinda what I expected, as I keep saying,
I believe we can improve this situation to avoid users' confusion, rather than
just saying CONFIG_RAS=n is the answer.

Thanks.