RE: [PATCH] EDAC, dmc520:: add DMC520 EDAC driver

From: Rui Zhao
Date: Wed Jan 23 2019 - 17:08:55 EST


Hi James,

On Wednesday, January 23, 2019 10:36 AM, James Morse wrote:
> When the time comes, could you post a dt-binding as the first patch? These add the documentation under Documentation/device-tree/bindings, and need to be CC'd to the DT folks.

Sure, will do.

>> Will change scrub_mode to HW.

> Do we know its enabled? This is something firmware has to set up, someone else's platform may do it differently.
> I think should read one of the scrub control registers to find out if its turned on.
> But, I can't find what uses this value ...

Looks like if scrub_mode is set to SCRUB_SW_SRC, the MC core will do arch specific sw scrub on the page with ce error.
We can config the mode base on register settings.

>>>> + /* Enable interrupts */
>>>> + dmc520_write_reg(edac,
>>>> + DRAM_ECC_INT_CE_MASK | DRAM_ECC_INT_UE_MASK,
>>>> + REG_OFFSET_INTERRUPT_CONTROL);

>>> What if they were enabled before? (e.g. enabled by firmware, the
>>> bootloader or kdump). If they're already enabled, can we race with
>>> the interrupt handler on another CPU and get double reporting of the error counter?

>> We don't expect interrupts to be enabled by firmware or bootloader if
>> this driver is enabled.

> What about a previous instance of this driver? Linux supports kexec, kdump and hibernate, all of which cause us to inherit a slightly used platform.

Agreed. We shouldn't make assumption on the register value when driver starts.

>> If firmware enables it, they're suppose to handle the interrupt.

> Ah, so you still have resident firmware!
> How come your firmware trusts linux not to turn off the memory controller?!
> These things are usually protected by trust zone so the OS can't pull the memory from under firmware's feet.

We have firmware to config the memory controller and want to have an EDAC driver to report ECC status.
Could you please elaborate a bit on the security concern on this approach? Like some malicious app/driver can access
memory controller registers can cause issue?

What's the recommend approach if Linux won't be able to access memory controller registers? Have firmware do the ECC
status monitoring and some sort of driver to query ECC status from firmware?

Appreciate your insights on this.

Thanks,
Rui