REGRESSION WITH BISECT: v6.5-rc6 TPM patch breaks S3 on some Intel systems

From: Todd Brandt
Date: Thu Aug 17 2023 - 17:10:19 EST


While testing S3 on 6.5.0-rc6 we've found that 5 systems are seeing a
crash and reboot situation when S3 suspend is initiated. To reproduce
it, this call is all that's required "sudo sleepgraph -m mem -rtcwake
15".

I’ve created a Bugzilla to track this issue here:
https://bugzilla.kernel.org/show_bug.cgi?id=217804

I've bisected the issue to this patch:

commit 554b841d470338a3b1d6335b14ee1cd0c8f5d754
Author: Mario Limonciello <mario.limonciello@xxxxxxx>
Date: Wed Aug 2 07:25:33 2023 -0500

tpm: Disable RNG for all AMD fTPMs

The TPM RNG functionality is not necessary for entropy when the CPU
already supports the RDRAND instruction. The TPM RNG functionality
was previously disabled on a subset of AMD fTPM series, but reports
continue to show problems on some systems causing stutter root
caused
to TPM RNG functionality.

Expand disabling TPM RNG use for all AMD fTPMs whether they have
versions
that claim to have fixed or not. To accomplish this, move the
detection
into part of the TPM CRB registration and add a flag indicating
that
the TPM should opt-out of registration to hwrng.

By reverting this patch in 6.5.0-rc6 the problem goes away, so it's
pretty clear that this commit is at fault. I've done further debugging
and I've found that if I simply comment out these lines in 6.5.0-rc6
the problem goes away. So the "crb_check_flags" call is the root cause.

diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c
index 9eb1a1859012..20ce8102e6bd 100644
--- a/drivers/char/tpm/tpm_crb.c
+++ b/drivers/char/tpm/tpm_crb.c
@@ -826,9 +826,9 @@ static int crb_acpi_add(struct acpi_device *device)
if (rc)
goto out;

- rc = crb_check_flags(chip);
- if (rc)
- goto out;
+// rc = crb_check_flags(chip);
+// if (rc)
+// goto out;

rc = tpm_chip_register(chip);