Re: k10temp: ZEN3 readings are broken

From: Gabriel C
Date: Mon Dec 21 2020 - 23:35:09 EST


Am Di., 22. Dez. 2020 um 04:58 Uhr schrieb Guenter Roeck <linux@xxxxxxxxxxxx>:
>
> Hi,
>
> On 12/21/20 5:45 PM, Gabriel C wrote:
> > Hello Guenter,
> >
> > while trying to add ZEN3 support for zenpower out of tree modules, I find out
> > the in-kernel k10temp driver is broken with ZEN3 ( and partially ZEN2 even ).
> >
> > commit 55163a1c00fcb526e2aa9f7f952fb38d3543da5e added:
> >
> > case 0x0 ... 0x1: /* Zen3 */
> >
> > however, this is wrong, we look for a model which is 0x21 for ZEN3,
> > these seem to
> > be steppings?
> >
> > Also, PLANE0/1 are wrong too, Icore has zero readouts even when fixing
> > the model.
> >
> > Looking at these ( there is something missing for 0x71 ZEN2 Ryzens
> > also ) that should be:
> >
> > PLANE0 (ZEN_SVI_BASE + 0x10)
> > PLANE1 (ZEN_SVI_BASE + 0xc)
> >
> > Which is the same as for ZEN2 >= 0x71. Since this is not really
> > documented and I have some
> > confirmations of these numbers from *somewhere* :-) I created a demo patch only.
> >
> > I would like AMD people to really have a look at the driver and
> > confirm the changes, since
> > getting information from *somewhere*, dosen't mean they are 100%
> > correct. However, the driver
> > is working with these changes.
> >
> > In any way the model needs changing to 0x21 even if we let the other
> > readings broken.
> >
> > There is my demo patch:
> >
> > https://crazy.dev.frugalware.org/fix-ZEN2-ZEN3-test1.patch
> >
> > Also, there is some discuss and testing for both drivers:
> >
> > https://github.com/ocerman/zenpower/issues/39
> >
>
> Thanks for the information. However, since I do not have time to actively maintain
> the driver, since each chip variant seems to use different addresses and scales,
> and since the information about voltages and currents is unpublished by AMD,
> I'll remove support for voltage/current readings from the upstream driver.
> I plan to send the patch doing that to Linus shortly after the commit window
> closes (or even before that).

Yes I saw that commit, and it is a shame how AMD is unwilling to
support 'sensors'
in their CPUs in 2020. I can understand why you can't maintain that
mess, but I don't
understand AMD.

However, it is not only about the Voltage, ZEN3 Ryzen Desktop CPUs,
have a model ID of 0x21, meaning while only 0x0 & 0x1 is here now we
only hit the else code and shows some weird temps, and no info about CCD's.

See:

smpboot: CPU0: AMD Ryzen 7 5800X 8-Core Processor (family: 0x19,
model: 0x21, stepping: 0x0)
smpboot: CPU0: AMD Ryzen 9 5900X 12-Core Processor (family: 0x19,
model: 0x21, stepping: 0x0)

etc...

So we need at least:

...
case 0x0 ... 0x1: /* ZEN3 SP3 ?!? */
case 0x21: /* ZEN3 Ryzen Desktop */
...

I believe 0x0 & 0x1 are NOT yet released EPYC/TR CPUs based on ZEN3.
At least is what the weird amd_energy driver added and since is only supporting
fam 17h model 0x31 which is TR 3000 & SP3 Rome, I guess fam 19h 0x1 is
TR/SP3 ZEN3.

( BTW off-topic this amd_energ driver should be removed or depend on BROKEN,
since is working as root only and breaks the sensors command output )

If that is the case, even if you remove the code, I think I
understand how the PLANEX registers are working
and can at least help the out of tree driver with these.

Maybe one day AMD is getting serious, who knows.

>
> Thanks,
> Guenter

Best Regards,

Gabriel C.