Re: [REGRESSION] suspend to ram fails in 6.2-rc1 due to tpm errors

From: Jarkko Sakkinen
Date: Tue Apr 25 2023 - 19:34:37 EST


On Sun Apr 23, 2023 at 6:34 PM EEST, Jarkko Sakkinen wrote:
> On Fri Apr 21, 2023 at 9:27 PM EEST, Jason A. Donenfeld wrote:
> > Did you use the patch I sent you and suspend and resume according to
> > the instructions I gave you? If not, I don't have much to add.
>
> Finally, I got it reproduced at my side with TPM 1.2:
>
> [ 0.379677] tpm_tis 00:00: 1.2 TPM (device-id 0x1, rev-id 1)
> [ 32.453447] tpm tpm0: tpm_transmit: tpm_recv: error -5
> [ 33.450601] tpm tpm0: Unable to read header
> [ 33.450607] tpm tpm0: tpm_transmit: tpm_recv: error -62
>
> I'll look at this further after I've sent v6.3 PR.

OK, so this gives the exact tpm_transmit call where it fails:

$ sudo bpftrace -e 'kprobe:tpm_transmit { @[kstack] = count(); }'
[sudo] password for jarkko:
Attaching 1 probe...
^C

@[
tpm_transmit+1
tpm1_pcr_read+177
tpm1_do_selftest+287
tpm_tis_resume+443
pnp_bus_resume+102
dpm_run_callback+81
device_resume+173
dpm_resume+238
dpm_resume_end+17
suspend_devices_and_enter+473
enter_state+563
pm_suspend+68
state_store+43
kobj_attr_store+15
sysfs_kf_write+59
kernfs_fop_write_iter+304
vfs_write+590
ksys_write+115
__x64_sys_write+25
do_syscall_64+88
entry_SYSCALL_64_after_hwframe+114
]: 1
@[
tpm_transmit+1
tpm1_do_selftest+179
tpm_tis_resume+443
pnp_bus_resume+102
dpm_run_callback+81
device_resume+173
dpm_resume+238
dpm_resume_end+17
suspend_devices_and_enter+473
enter_state+563
pm_suspend+68
state_store+43
kobj_attr_store+15
sysfs_kf_write+59
kernfs_fop_write_iter+304
vfs_write+590
ksys_write+115
__x64_sys_write+25
do_syscall_64+88
entry_SYSCALL_64_after_hwframe+114
]: 1
@[
tpm_transmit+1
tpm1_pm_suspend+203
tpm_pm_suspend+131
__pnp_bus_suspend+65
pnp_bus_suspend+19
dpm_run_callback+81
__device_suspend+329
dpm_suspend+432
dpm_suspend_start+155
suspend_devices_and_enter+370
enter_state+563
pm_suspend+68
state_store+43
kobj_attr_store+15
sysfs_kf_write+59
kernfs_fop_write_iter+304
vfs_write+590
ksys_write+115
__x64_sys_write+25
do_syscall_64+88
entry_SYSCALL_64_after_hwframe+114
]: 1
@[
tpm_transmit+1
tpm1_get_random+206
tpm_get_random+70
tpm_hwrng_read+21
hwrng_fillfn+234
kthread+230
ret_from_fork+41
]: 75897

So it is the very first PCR read in tpm1_do_selftest.

There is a bug at plain sight in tpm1_tis_resume(): before
tpm_tis_resume() calls tpm1_do_selftest(), it only requests
and relinquishes locality. This is not sufficient: it should
also disable clkrun protocol.

tpm1_do_selftest() is called also during the driver initialization
successfully, the difference being that clkrun protocol is disabled.

I'm compiling now a kernel with a test fix that calls tpm_chip_start()
and tpm_chip_stop() as a substitute for request/relinquish locality.
These should be used anyway instead of ad-hoc code.

BR, Jarkko

BR, Jarkko