RE: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware

From: Ghannam, Yazen
Date: Thu May 23 2019 - 16:03:38 EST


> -----Original Message-----
> From: Borislav Petkov <bp@xxxxxxxxx>
> Sent: Friday, May 17, 2019 3:02 PM
> To: Ghannam, Yazen <Yazen.Ghannam@xxxxxxx>
> Cc: Luck, Tony <tony.luck@xxxxxxxxx>; linux-edac@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; x86@xxxxxxxxxx
> Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware
>
>
> On Fri, May 17, 2019 at 07:49:10PM +0000, Ghannam, Yazen wrote:
> > > @@ -1569,7 +1575,13 @@ static void __mcheck_cpu_init_clear_banks(void)
> > >
> > > if (!b->init)
> > > continue;
> > > +
> > > + /* Check if any bits are implemented in h/w */
> > > wrmsrl(msr_ops.ctl(i), b->ctl);
> > > + rdmsrl(msr_ops.ctl(i), msrval);
> > > +
> > > + b->init = !!msrval;
> > > +
> > Just a minor nit, but can we group the comment, RDMSR, and check
> > together? The WRMSR is part of normal operation and isn't tied to the
> > check.
>
> Of course it is - that's the "throw all 1s at it" part :)
>

I did a bit more testing and I noticed that writing "0" disables a bank with no way to reenable it.

For example:
1) Read bank10.
a) Succeeds; returns "fffffffffffffff".
2) Write "0" to bank10.
a) Succeeds; hardware register is set to "0".
b) Hardware register is checked, and b->init=0.
3) Read bank10.
a) Fails, because b->init=0.
4) Write non-zero value to bank10 to reenable it.
a) Fails, because b->init=0.
5) Reboot needed to reset bank.

Is that okay?

Thanks,
Yazen