Re: [PATCH] tpm: return false from tpm_amd_is_rng_defective on non-x86 platforms

From: Jerry Snitselaar
Date: Fri Jul 07 2023 - 16:18:59 EST


On Fri, Jul 07, 2023 at 06:07:49PM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
> for once, to make this easily accessible to everyone.
>
> So what's the way forward now? It sounded like Jarkko wanted to apply
> the patch from this thread days ago, but that didn't happen afaics. Then
> below message showed up, but Marios patch also wasn't applied.
>
> Is this intentional, or did something somewhere fall through the cracks?
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

I haven't seen any update to Jarkko's repo.

My patch resolves the immediate issue being seen on the ppc system,
and was mostly just me asking why even go through this amd specific
code on non-x86 systems.

The vio bus shutdown code only does the remove call when kexec is in
progress. The pnp and platform bus type shutdown calls do not do
something similar so maybe the check in Mario's patch isn't needed,
but I don't think it would hurt to have it in there.

Regards,
Jerry

> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
> #regzbot poke
>
> On 05.07.23 19:04, Jerry Snitselaar wrote:
> > On Fri, Jun 30, 2023 at 01:07:00PM +0300, Jarkko Sakkinen wrote:
> >> On Thu Jun 29, 2023 at 11:41 PM EEST, Jerry Snitselaar wrote:
> >>> tpm_amd_is_rng_defective is for dealing with an issue related to the
> >>> AMD firmware TPM, so on non-x86 architectures just have it inline and
> >>> return false.
> >>>
> >>> Cc: Jarkko Sakkinen <jarkko@xxxxxxxxxx>
> >>> Cc: "Jason A. Donenfeld" <Jason@xxxxxxxxx>
> >>> Cc: Jason Gunthorpe <jgg@xxxxxxxx>
> >>> Cc: Peter Huewe <peterhuewe@xxxxxx>
> >>> Cc: stable@xxxxxxxxxxxxxxx
> >>> Cc: Linux regressions mailing list <regressions@xxxxxxxxxxxxxxx>
> >>> Cc: Mario Limonciello <mario.limonciello@xxxxxxx>
> >>> Reported-by: Aneesh Kumar K. V <aneesh.kumar@xxxxxxxxxxxxx>
> >>> Reported-by: Sachin Sant <sachinp@xxxxxxxxxxxxx>
> >>> Closes: https://lore.kernel.org/lkml/99B81401-DB46-49B9-B321-CF832B50CAC3@xxxxxxxxxxxxx/
> >>> Fixes: f1324bbc4011 ("tpm: disable hwrng for fTPM on some AMD designs")
> >>> Signed-off-by: Jerry Snitselaar <jsnitsel@xxxxxxxxxx>
> >>> ---
> >>> drivers/char/tpm/tpm-chip.c | 7 +++++++
> >>> 1 file changed, 7 insertions(+)
> >>>
> >>> diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
> >>> index cd48033b804a..cf5499e51999 100644
> >>> --- a/drivers/char/tpm/tpm-chip.c
> >>> +++ b/drivers/char/tpm/tpm-chip.c
> >>> @@ -518,6 +518,7 @@ static int tpm_add_legacy_sysfs(struct tpm_chip *chip)
> >>> * 6.x.y.z series: 6.0.18.6 +
> >>> * 3.x.y.z series: 3.57.y.5 +
> >>> */
> >>> +#ifdef CONFIG_X86
> >>> static bool tpm_amd_is_rng_defective(struct tpm_chip *chip)
> >>> {
> >>> u32 val1, val2;
> >>> @@ -566,6 +567,12 @@ static bool tpm_amd_is_rng_defective(struct tpm_chip *chip)
> >>>
> >>> return true;
> >>> }
> >>> +#else
> >>> +static inline bool tpm_amd_is_rng_defective(struct tpm_chip *chip)
> >>> +{
> >>> + return false;
> >>> +}
> >>> +#endif /* CONFIG_X86 */
> >>>
> >>> static int tpm_hwrng_read(struct hwrng *rng, void *data, size_t max, bool wait)
> >>> {
> >>> --
> >>> 2.38.1
> >>
> >> Sanity check, this was the right patch, right?
> >>
> >> I'll apply it.
> >>
> >> BR, Jarkko
> >
> > Sorry, I've been dealing with a family health issue the past week. It wasn't clear
> > to me why chip->ops was null when I first took a look, but I think I understand
> > now looking at it again this morning. The stack trace shows it in the device_shutdown() path:
> >
> > [ 34.381674] NIP [c0000000009db1e4] tpm_amd_is_rng_defective+0x74/0x240
> > [ 34.381681] LR [c0000000009db928] tpm_chip_unregister+0x138/0x160
> > [ 34.381685] Call Trace:
> > [ 34.381686] [c00000009742faa0] [c0000000009db928] tpm_chip_unregister+0x138/0x160
> > [ 34.381690] [c00000009742fae0] [c0000000009eab94] tpm_ibmvtpm_remove+0x34/0x130
> > [ 34.381695] [c00000009742fb50] [c000000000115738] vio_bus_remove+0x58/0xd0
> > [ 34.381701] [c00000009742fb90] [c000000000a01ecc] device_shutdown+0x21c/0x39c
> > [ 34.381705] [c00000009742fc20] [c0000000001a2684] kernel_restart_prepare+0x54/0x70
> > [ 34.381710] [c00000009742fc40] [c000000000292c48] kernel_kexec+0xa8/0x100
> > [ 34.381714] [c00000009742fcb0] [c0000000001a2cd4] __do_sys_reboot+0x214/0x2c0
> > [ 34.381718] [c00000009742fe10] [c000000000034adc] system_call_exception+0x13c/0x340
> > [ 34.381723] [c00000009742fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec
> >
> > So I think what happened is:
> >
> > device_shutdown -> dev->class->shutdown_pre (tpm_class_shutdown) // clears chip->ops
> > -> dev->bus->shutdown (vio_bus_shutdown) -> vio_bus_remove -> viodrv->remove (tpm_ibmvtpm_remove) -> tpm_chip_unregister -> tpm_amd_is_rng_defective -> oops!
> >
> >
> > I guess anything that gets called in the tpm_chip_unregister path
> > should be doing a check of chip->ops prior to using it. So I think
> > Mario's patch would still be a good thing to have.
> >
> > Regards,
> > Jerry
> >
> >
> >