Re: [PATCH] tpm_tis_spi: fix:release chip select when flow control fails

From: shaopeijie
Date: Mon Apr 24 2023 - 02:39:24 EST


On Sat, 22 Apr 2023 00:46:40 +0300, jarkko@xxxxxxxxxx wrote:
> On Fri, Mar 31, 2023 at 02:56:25PM +0800, shaopeijie@xxxxxxxx wrote:
>> From: Peijie Shao <shaopeijie@xxxxxxxx>
>>
>> The TPM's chip select will leave active after spi_bus_unlock when
>> flow control timeout, and may interfere other chips sharing the same
>> spi bus, or may damage them dule to level conflict on MISO pin.
>>
>> So the patch deactives the chip select by sending an empty message
>> with cs_change=0 if flow control fails.
>>
>> The reason why flow control timeout for me is unfortunately I got a
>> damaged TPM chip. It always pull MISO low during cs active(this can
>> be easily emulated by wire MISO to the ground), not responding anything,
>> and dmesg shows below:
>> ...
>> [ 311.150725] tpm_tis_spi: probe of spi0.0 failed with error -110
>> ...
>
> We don't really cease to support damaged hardware but it is true
> that the *software* failure paths should probably be robust enough
> to deativate chip select.
>
> I would rewrite this as
>
> "The failure paths in tpm_tis_spi_transfer() do not deactivate
> chip select. Send an empty message (cs_select == 0) to overcome
> this."
>
> That's all there needs to be. We do not care about broken hardware.
>

I agree. The patch is to robust *software* path, but not support
broken hardware.

>>
>> Signed-off-by: Peijie Shao <shaopeijie@xxxxxxxx>
>> ---
>> drivers/char/tpm/tpm_tis_spi_main.c | 13 ++++++++++++-
>> 1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/char/tpm/tpm_tis_spi_main.c b/drivers/char/tpm/tpm_tis_spi_main.c
>> index a0963a3e92bd..5c8ff343761f 100644
>> --- a/drivers/char/tpm/tpm_tis_spi_main.c
>> +++ b/drivers/char/tpm/tpm_tis_spi_main.c
>> @@ -105,8 +105,19 @@ int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len,
>> /* Flow control transfers are receive only */
>> spi_xfer.tx_buf = NULL;
>> ret = phy->flow_control(phy, &spi_xfer);
>> - if (ret < 0)
>> + if (ret < 0) {
>> + /*
>> + * Release cs pin if the device is not responding, regardless of the reason.
>> + * Notice cs may alreadly been released if the failure was caused inside
>> + * spi_sync_locked called by flow_control, in this situation, a pluse may be
>> + * generated on cs.
>> + */
>
> Please replace above comment with:
>
> /* Deactivate chip select: */
>

I agree.

>> + memset(&spi_xfer, 0, sizeof(spi_xfer));
>> + spi_message_init(&m);
>> + spi_message_add_tail(&spi_xfer, &m);
>> + spi_sync_locked(phy->spi_device, &m);
>> goto exit;
>> + }
>>
>> spi_xfer.cs_change = 0;
>> spi_xfer.len = transfer_len;
>> --
>> 2.39.1
>>
>>
>>
>
> There's three call sites, why are you taking care of only one
> of them?
>
> I'd consider instead:
>
> return 0;
>
> exit:
> memset(&spi_xfer, 0, sizeof(spi_xfer));
> spi_message_init(&m);
> spi_message_add_tail(&spi_xfer, &m);
> spi_sync_locked(phy->spi_device, &m);
> spi_bus_unlock(phy->spi_device->master);
> return ret;
> }

I found that spi_sync_locked() will deactivate cs if any failure
is generated inside it.

spi_sync_locked()
...
spi_transfer_one_message()
...
if (ret != 0 || !keep_cs)
spi_set_cs(msg->spi, false, false);

spi_transfer_one_message() is the default transfer method, I think
other spi controllers who implements .transfer_one_message should
have the same behaviour.

Sending an empty message when cs is already deactivated may have a small
side effect: cs will go activate and deactivate in a very short time,
means a pulse will be generated on cs pin. This may also happen when
failure is generated by spi_sync_locked() which called inside ->flow_control().

So to reduce this, I prefer deactivating cs only when phy->flow_control()
fails. If the side effect is totaly acceptable, your advice maybe better.

Waiting for your suggestions.

>
> The the rollback would apply to all call sites.
>
> BR, Jarkko

Thanks!

Peijie, Shao