Re: [PATCH V2] MMC: core: cap MMC card timeouts at 2 seconds.

From: Torne (Richard Coles)
Date: Fri Jun 01 2012 - 06:20:09 EST


On 1 June 2012 11:09, Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote:
> On 01/06/12 12:32, Torne (Richard Coles) wrote:
>> On 1 June 2012 10:31, Torne (Richard Coles) <torne@xxxxxxxxxx> wrote:
>>> On 1 June 2012 09:35, Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote:
>>>> On 29/05/12 05:32, Ben Hutchings wrote:
>>>>> On Mon, 2012-05-28 at 18:31 +0100, Torne (Richard Coles) wrote:
>>>>>> From: "Torne (Richard Coles)" <torne@xxxxxxxxxx>
>>>>>>
>>>>>> MMC CSD info can specify very large, ridiculous timeouts, big enough to
>>>>>> overflow timeout_ns on 32-bit machines. This can result in the card
>>>>>> timing out on every operation because the wrapped timeout value is far
>>>>>> too small.
>>>>>>
>>>>>> Fix the overflow by capping the result at 2 seconds.  Cards specifying
>>>>>> longer timeouts are almost certainly insane, and host controllers
>>>>>> generally cannot support timeouts that long in any case.
>>>>>>
>>>>>> 2 seconds should be plenty of time for any card to actually function;
>>>>>> the timeout calculation code is already using 1 second as a "worst case"
>>>>>> timeout for cards running in SPI mode.
>>>>>
>>>>> Needs a 'Signed-off-by'.
>>>>>
>>>>>> ---
>>>>>>  drivers/mmc/core/core.c |   11 ++++++++++-
>>>>>>  1 files changed, 10 insertions(+), 1 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
>>>>>> index 0b6141d..3b4a9fc 100644
>>>>>> --- a/drivers/mmc/core/core.c
>>>>>> +++ b/drivers/mmc/core/core.c
>>>>>> @@ -512,7 +512,16 @@ void mmc_set_data_timeout(struct mmc_data *data, const struct mmc_card *card)
>>>>>>      if (data->flags & MMC_DATA_WRITE)
>>>>>>              mult <<= card->csd.r2w_factor;
>>>>>>
>>>>>> -    data->timeout_ns = card->csd.tacc_ns * mult;
>>>>>> +    /*
>>>>>> +     * The timeout in nanoseconds may overflow with some cards. Cap it at
>>>>>> +     * two seconds both to avoid the overflow and also because host
>>>>>> +     * controllers cannot generally generate timeouts that long anyway.
>>>>>> +     */
>>>>>> +    if (card->csd.tacc_ns <= (2 * NSEC_PER_SEC) / mult)
>>>>>> +            data->timeout_ns = card->csd.tacc_ns * mult;
>>>>>> +    else
>>>>>> +            data->timeout_ns = 2 * NSEC_PER_SEC;
>>>>>
>>>>> We clearly need to guard against overflow here, and this is the correct
>>>>> way to clamp the multiplication.  I can't speak as to whether 2 seconds
>>>>> is the right limit.
>>>>
>>>> The host controllers I have looked at have a limit of around 2.5 seconds.
>>>>
>>>> But why not just use the size of the type as the limit? e.g.
>>>>
>>>>        if (card->csd.tacc_ns <= UINT_MAX / mult)
>>>>                data->timeout_ns = card->csd.tacc_ns * mult;
>>>>        else
>>>>                data->timeout_ns = UINT_MAX;
>>>
>>> The host controller drivers don't seem to all do a very good job of
>>> preventing further overflows or handling large values correctly
>>> (though some do). sdhci takes the especially annoying additional step
>>> of printk'ing a warning for *every single MMC command* where
>>> data->timeout_ns is larger than the controller can accommodate.
>>> Capping it to a value with a sensible order of magnitude seems to make
>>> it more likely that cards with obviously bogus CSD parameters will
>>> actually work. I don't object to using a larger number for the limit,
>>> but UINT_MAX on a 64-bit system obviously doesn't limit this at all
>>> and will leave you with timeouts up to 17 minutes, which seems
>>> ridiculous :)
>>
>> Er, not 17 minutes; 102.4 seconds as I used later in my mail. SD cards
>> have their timeouts capped already, so their larger 100x multiplier is
>> not a problem; 102.4 seconds is the longest for an MMC card.
>>
>
> Linux is LP64. i.e. "int" is always 32-bit in the kernel

Oh, sorry; didn't think that through. So, yeah, that'd be 4.29
seconds, which is still too long for many hosts :)

>>> My original motivation for this patch is that I have a device with an
>>> eMMC that specifies a 25.5 second timeout, attached to a sdhci host
>>> whose maximum timeout is 2.8 seconds. Originally I proposed a patch to
>>> just remove the warning in sdhci, but nobody replied, and when I
>>> realised there was actually an overflow happening I opted to fix that
>>> instead.
>>>
>>> So, yeah, we could use UINT_MAX, but then at minimum I also need to
>>> kill the warning in sdhci to make my device work, and probably all the
>>> host controller drivers need to be checked to make sure they don't use
>>> timeout_ns in a way that can overflow.
>>>
>>> I've also just noticed that struct mmc_data's comment for timeout_ns
>>> says /* data timeout (in ns, max 80ms) */ which is not true (the max
>>> is 102.4 seconds if my math is correct), which may have contributed to
>>> the host drivers not being too careful :)
>>>
>>> What do you think?
>
> If you can identify the card, the you could make a new quirk in a fashion
> similar to mmc_card_long_read_time().
>
> Alternatively you could make use of SDHCI_QUIRK_BROKEN_TIMEOUT_VAL or
> introduce your own sdhci quirk to suppress the warning.

Those would work, but it seems silly to me to suppress the warning
only for some cards, or to cap the timeout only for some cards. The
best way to identify a card that has a "broken" timeout value is.. if
the timeout value is a really big number, no? I am very skeptical that
there is a card out there anywhere that will actually take more than
two seconds (or 4.29 seconds, if you prefer) to successfully complete
a command.

The warning itself seems to have extremely limited use; there's
nothing you can do about it other than suppress it (the driver is
already capping the timeout for you), and because timeouts are
calculated per-command the warning is absurdly noisy (in fact, the fun
part on my system was the warning being logged to klogd, being written
to logfiles on the eMMC, causing more warnings, causing more log
messages, etc) :) sdhci is the only host driver that complains about
this; it seems logical for the warning to apply to all hosts, or to
none of them...

>>>
>>>>>
>>>>> Ben.
>>>>>
>>>>>>      data->timeout_clks = card->csd.tacc_clks * mult;
>>>>>>
>>>>>>      /*
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Torne (Richard Coles)
>>> torne@xxxxxxxxxx
>>
>>
>>
>



--
Torne (Richard Coles)
torne@xxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/