Re: [PATCHv2] mmc: block: Differentiate busy and PROG state

From: Ulf Hansson
Date: Wed Jul 07 2021 - 07:57:52 EST


On Wed, 7 Jul 2021 at 10:27, Christian Löhle <CLoehle@xxxxxxxxxxxxxx> wrote:
>
> Prevent race condition with ioctl commands
>
> To fully prevent a race condition where the next
> issued command will be rejected as the card is no
> longer signalling busy but not yet back in TRAN state.
> The card may be in PROG state without signalling busy,
> for some of the commands that are only R1, but also
> for R1b commands, the card will signal non-busy as soon
> as receive buffers are free again, but the card has
> not finished handling the command and may therefore be
> in PROG.

Can you please point me to the corresponding information in the spec
that states that the above behavior is correct?

In principle what you are saying is that busy signalling on DAT0 is
*entirely* broken, at least for some cards and some commands.

> Since the next command is not known at the time of
> completion we must assume that it may be one that can
> only be accepted in TRAN state.
> Therefore we only consider a PROG command completed
> when we have polled for TRAN.

Right. See more comments about this further below.

>
> Signed-off-by: Christian Loehle <cloehle@xxxxxxxxxxxxxx>
> ---
> drivers/mmc/core/block.c | 86 ++++++++++++++++++++++++++++++++++----
> drivers/mmc/core/mmc_ops.c | 2 +-
> include/linux/mmc/mmc.h | 10 +++--
> include/linux/mmc/sd.h | 3 ++
> 4 files changed, 87 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
> index 88f4c215caa6..cb78690647bf 100644
> --- a/drivers/mmc/core/block.c
> +++ b/drivers/mmc/core/block.c
> @@ -411,7 +411,34 @@ static int mmc_blk_ioctl_copy_to_user(struct mmc_ioc_cmd __user *ic_ptr,
> return 0;
> }
>
> -static int card_busy_detect(struct mmc_card *card, unsigned int timeout_ms,
> +static int is_prog_cmd(struct mmc_command *cmd)
> +{
> + /*
> + * Cards will move to programming state (PROG) after these commands.
> + * So we must not consider the command as completed until the card
> + * has actually returned back to TRAN state.
> + */
> + switch (cmd->opcode) {
> + case MMC_STOP_TRANSMISSION:

This has an R1B response, hence we already do the proper polling that is needed.

In other words, we don't need to explicitly check for this command
here, as we are already checking the response type (R1B) in
__mmc_blk_ioctl_cmd().

> + case MMC_WRITE_DAT_UNTIL_STOP:

What's this used for? It's obsolete, at least in the eMMC spec. Please drop it.

> + case MMC_WRITE_BLOCK:
> + case MMC_WRITE_MULTIPLE_BLOCK:

These are already supported via the generic block interface, please
drop the checks.

> + case MMC_PROGRAM_CID:
> + case MMC_PROGRAM_CSD:

Let's discuss these, since they have R1 responses.

Although, according to the eMMC spec, the card moves to rcv state, not
the prg state as you refer to in the commit message. Normally, we
don't need to poll for busy/tran completion of these commands.

Have you observed through proper tests that this is actually needed?

> + case MMC_SET_WRITE_PROT:
> + case MMC_CLR_WRITE_PROT:
> + case MMC_ERASE:

The three above have R1B, please drop them from here as they are
already supported correctly.

> + case MMC_LOCK_UNLOCK:

Again, this has an R1 response and the card moves to rcv state.
Normally we shouldn't need to poll, but I have to admit that the eMMC
spec isn't really clear on what will happen when using the "forced
erase" argument. The spec mentions a 3 minute timeout....

> + case MMC_SET_TIME: /* Also covers SD_WRITE_EXTR_SINGLE */
> + case MMC_GEN_CMD:
> + case SD_WRITE_EXTR_MULTI:

Are these actually being used? If not, please drop them from being
supported. I don't want to encourage crazy operations being issued
from userspace.

> + return true;
> + default:
> + return false;
> + }
> +}

Overall, it looks like we need to add a check for MMC_LOCK_UNLOCK to
poll for busy, but that's it, I think.

> +
> +static int card_poll_until_tran(struct mmc_card *card, unsigned int timeout_ms,
> u32 *resp_errs)
> {
> unsigned long timeout = jiffies + msecs_to_jiffies(timeout_ms);
> @@ -433,8 +460,7 @@ static int card_busy_detect(struct mmc_card *card, unsigned int timeout_ms,
> *resp_errs |= status;
>
> /*
> - * Timeout if the device never becomes ready for data and never
> - * leaves the program state.
> + * Timeout if the device never returns to TRAN state.
> */
> if (done) {
> dev_err(mmc_dev(card->host),
> @@ -442,6 +468,41 @@ static int card_busy_detect(struct mmc_card *card, unsigned int timeout_ms,
> __func__, status);
> return -ETIMEDOUT;
> }
> + } while (R1_CURRENT_STATE(status) != R1_STATE_TRAN);
> +
> + return err;
> +}
> +
> +static int card_busy_detect(struct mmc_card *card, unsigned int timeout_ms,
> + u32 *resp_errs)
> +{
> + unsigned long timeout = jiffies + msecs_to_jiffies(timeout_ms);
> + int err = 0;
> + u32 status;
> +
> + do {
> + bool done = time_after(jiffies, timeout);
> +
> + err = __mmc_send_status(card, &status, 5);
> + if (err) {
> + dev_err(mmc_dev(card->host),
> + "error %d requesting status\n", err);
> + return err;
> + }
> +
> + /* Accumulate any response error bits seen */
> + if (resp_errs)
> + *resp_errs |= status;
> +
> + /*
> + * Timeout if the device never becomes ready for data.
> + */
> + if (done) {
> + dev_err(mmc_dev(card->host),
> + "Card remained busy! %s status: %#x\n",
> + __func__, status);
> + return -ETIMEDOUT;
> + }
> } while (!mmc_ready_for_data(status));

I don't quite understand what we accomplish with polling for TRAN
state in one case and in the other case, both TRAN and READY_FOR_DATA.
Why can't we always poll for TRAN and READY_FOR_DATA? It should work
for all cases, no?

>
> return err;
> @@ -596,12 +657,19 @@ static int __mmc_blk_ioctl_cmd(struct mmc_card *card, struct mmc_blk_data *md,
>
> if (idata->rpmb || (cmd.flags & MMC_RSP_R1B) == MMC_RSP_R1B) {
> /*
> - * Ensure RPMB/R1B command has completed by polling CMD13
> - * "Send Status".
> + * Ensure card is no longer signalling busy by polling CMD13.
> */
> err = card_busy_detect(card, MMC_BLK_TIMEOUT_MS, NULL);
> }
>
> + if (is_prog_cmd(&cmd)) {
> + /*
> + * Ensure card has returned back to TRAN state
> + * and is ready to accept a new command.
> + */
> + err = card_poll_until_tran(card, MMC_BLK_TIMEOUT_MS, NULL);
> + }
> +
> return err;
> }
>
> @@ -1630,7 +1698,7 @@ static int mmc_blk_fix_state(struct mmc_card *card, struct request *req)
>
> mmc_blk_send_stop(card, timeout);
>
> - err = card_busy_detect(card, timeout, NULL);
> + err = card_poll_until_tran(card, timeout, NULL);
>
> mmc_retune_release(card->host);
>
> @@ -1662,7 +1730,7 @@ static void mmc_blk_read_single(struct mmc_queue *mq, struct request *req)
> goto error_exit;
>
> if (!mmc_host_is_spi(host) &&
> - !mmc_ready_for_data(status)) {
> + !mmc_tran_and_ready_for_data(status)) {
> err = mmc_blk_fix_state(card, req);
> if (err)
> goto error_exit;
> @@ -1784,7 +1852,7 @@ static void mmc_blk_mq_rw_recovery(struct mmc_queue *mq, struct request *req)
>
> /* Try to get back to "tran" state */
> if (!mmc_host_is_spi(mq->card->host) &&
> - (err || !mmc_ready_for_data(status)))
> + (err || !mmc_tran_and_ready_for_data(status)))
> err = mmc_blk_fix_state(mq->card, req);
>
> /*
> @@ -1854,7 +1922,7 @@ static int mmc_blk_card_busy(struct mmc_card *card, struct request *req)
> if (mmc_host_is_spi(card->host) || rq_data_dir(req) == READ)
> return 0;
>
> - err = card_busy_detect(card, MMC_BLK_TIMEOUT_MS, &status);
> + err = card_poll_until_tran(card, MMC_BLK_TIMEOUT_MS, &status);
>
> /*
> * Do not assume data transferred correctly if there are any error bits
> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
> index 973756ed4016..a0be45118a93 100644
> --- a/drivers/mmc/core/mmc_ops.c
> +++ b/drivers/mmc/core/mmc_ops.c
> @@ -465,7 +465,7 @@ static int mmc_busy_cb(void *cb_data, bool *busy)
> if (err)
> return err;
>
> - *busy = !mmc_ready_for_data(status);
> + *busy = !mmc_tran_and_ready_for_data(status);
> return 0;
> }

Kind regards
Uffe