Re: [PATCH v1 1/1] mmc: sdhci-of-dwcmshc: Enable timeout quirk for BlueField-3 SoC

From: Adrian Hunter
Date: Thu Jan 04 2024 - 04:25:16 EST


On 19/12/23 23:18, Liming Sun wrote:
>
>
>> -----Original Message-----
>> From: Adrian Hunter <adrian.hunter@xxxxxxxxx>
>> Sent: Monday, December 11, 2023 6:39 AM
>> To: Liming Sun <limings@xxxxxxxxxx>; Christian Loehle
>> <christian.loehle@xxxxxxx>; Ulf Hansson <ulf.hansson@xxxxxxxxxx>; David
>> Thompson <davthompson@xxxxxxxxxx>
>> Cc: linux-mmc@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
>> Subject: Re: [PATCH v1 1/1] mmc: sdhci-of-dwcmshc: Enable timeout quirk for
>> BlueField-3 SoC
>>
>> On 30/11/23 15:19, Liming Sun wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Christian Loehle <christian.loehle@xxxxxxx>
>>>> Sent: Monday, November 27, 2023 8:36 AM
>>>> To: Liming Sun <limings@xxxxxxxxxx>; Adrian Hunter
>>>> <adrian.hunter@xxxxxxxxx>; Ulf Hansson <ulf.hansson@xxxxxxxxxx>; David
>>>> Thompson <davthompson@xxxxxxxxxx>
>>>> Cc: linux-mmc@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
>>>> Subject: Re: [PATCH v1 1/1] mmc: sdhci-of-dwcmshc: Enable timeout quirk
>> for
>>>> BlueField-3 SoC
>>>>
>>>> On 18/11/2023 13:46, Liming Sun wrote:
>>>>> This commit enables SDHCI_QUIRK_BROKEN_TIMEOUT_VAL to solve the
>>>>> intermittent eMMC timeout issue reported on some cards under eMMC
>>>>> stress test.
>>>>>
>>>>> Reported error message:
>>>>> dwcmshc MLNXBF30:00: __mmc_blk_ioctl_cmd: data error -110
>>>>>
>>>>> Signed-off-by: Liming Sun <limings@xxxxxxxxxx>
>>>>> ---
>>>>> drivers/mmc/host/sdhci-of-dwcmshc.c | 3 ++-
>>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/mmc/host/sdhci-of-dwcmshc.c
>>>> b/drivers/mmc/host/sdhci-of-dwcmshc.c
>>>>> index 3a3bae6948a8..3c8fe8aec558 100644
>>>>> --- a/drivers/mmc/host/sdhci-of-dwcmshc.c
>>>>> +++ b/drivers/mmc/host/sdhci-of-dwcmshc.c
>>>>> @@ -365,7 +365,8 @@ static const struct sdhci_pltfm_data
>>>> sdhci_dwcmshc_pdata = {
>>>>> #ifdef CONFIG_ACPI
>>>>> static const struct sdhci_pltfm_data sdhci_dwcmshc_bf3_pdata = {
>>>>> .ops = &sdhci_dwcmshc_ops,
>>>>> - .quirks = SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN,
>>>>> + .quirks = SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN |
>>>>> + SDHCI_QUIRK_BROKEN_TIMEOUT_VAL,
>>>>> .quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN |
>>>>> SDHCI_QUIRK2_ACMD23_BROKEN,
>>>>> };
>>>>
>>>> __mmc_blk_ioctl_cmd: data error ?
>>>> What stresstest are you running that issues ioctl commands?
>>>> On which commands does the timeout occur?
>>>> Anyway you should be able to increase the timeout in ioctl structure
>>>> directly, i.e. in userspace, or does that not work?
>>>
>>> It's running stress test with tool like "fio --name=randrw_stress_round_1 --
>> ioengine=libaio --direct=1 --time_based=1 --end_fsync=1 --ramp_time=5 --
>> norandommap=1 --randrepeat=0 --group_reporting=1 --numjobs=4 --
>> iodepth=128 --rw=randrw --overwrite=1 --runtime=36000 --
>> bssplit=4K/44:8K/1:12K/1:16K/1:24K/1:28K/1:32K/1:40K/32:64K/5:68K/7:72K
>> /3:76K/3 --filename=/dev/mmcblk0"
>>> The tool(application) is owned by user or with some standard tool.
>>
>> fio does not send mmc ioctls, so I am also a bit confused about
>> how you get "__mmc_blk_ioctl_cmd: data error -110" ?
>
> There are other activities or background task going on. I assume it's other
> MMC access which are affected by the stress FIO and got timeout. Would it make sense?
>

It depends on whether the IOCTL is overriding the timeout. In
struct mmc_ioc_cmd there is data_timeout_ns which overrides the
mmc core data timeout calculated by mmc_set_data_timeout(). There
is also cmd_timeout_ms for commands. You need to check whether
"__mmc_blk_ioctl_cmd: data error -110" is because data_timeout_ns
was set too low (but non-zero) by the caller of the IOCTL.