RE: [PATCH v1] scsi: ufs: clear doorbell for hibern8 errors when using ah8

From: Kiwoong Kim
Date: Thu Oct 07 2021 - 22:03:14 EST


> On 10/7/21 12:40 AM, Kiwoong Kim wrote:
> > When an scsi command is dispatched right after host complete all the
> > pending requests goes to idle and ufs driver tries to ring a doorbell,
> > host might be still during entering into hibern8. If an error occurrs
> > during that period, the doorbell might not be zero. In this case,
> > clearing it should be needed to requeue its command.
> > Currently, ufshcd_err_handler goes directly to reset when the driver's
> > link state is broken. This patch is to make it clear doorbells in the
> > case. In the situation, communication would be disabled, so TM isn't
> > necessary or we can say it's not available.
>
> The above text is too hard to comprehend. Please make it more clear. As an
> example, in the first sentence, what does "goes to idle" apply to?
> Does it apply to "SCSI command", "pending requests" or something else?

My 'goes to idle' means a state of no pended UTP requests.
I write 'scsi command' because the symptom that I saw is related with scsi command.

>
> What mechanism does "If an error occurs" refer to? A SCSI command
> processing error, a link error or another type of error?

Hibern8 errors written in the title.

>
> > Here's an actual symptom that I've faced. At the time, tag #17 is
> > still pended even after host reset. And then the block timer is
> > expired.
>
> pended -> pending.

Got it.
>
> > exynos-ufs 11100000.ufs: ufshcd_check_errors: Auto Hibern8 Enter
> > failed - status: 0x00000040, upmcrs: 0x00000001 ..
> > host_regs: 00000050: b8671000 00000008 00020000 00000000 ..
> > exynos-ufs 11100000.ufs: ufshcd_abort: Device abort task at tag 17
>
> The relationship between the example and the description above the example
> is not clear. If a command is pending before the error handler starts,
> aborting that command fails and the host is not reset then the command
> will still be pending after the error handler has finished.
>
> > @@ -6138,7 +6139,12 @@ static void ufshcd_err_handler(struct Scsi_Host
> *host)
> > (hba->saved_uic_err & (UFSHCD_UIC_DL_NAC_RECEIVED_ERROR |
> > UFSHCD_UIC_DL_TCx_REPLAY_ERROR)))) {
> > needs_reset = true;
> > - goto do_reset;
> > + spin_lock_irqsave(hba->host->host_lock, flags);
> > + if (!hba->ahit && ufshcd_is_link_broken(hba))
> > + link_broken_in_ah8 = true;
> > + spin_unlock_irqrestore(hba->host->host_lock, flags);
> > + if (!link_broken_in_ah8)
> > + goto do_reset;
> > }
> >
>
> My understanding is that hba->ahit == 0 means that auto-hibernation is
> disabled. Hence, the above code sets 'link_broken_in_ah8' only if auto-
> hibernation is disabled. So what does the name of the variable
> 'link_broken_in_ah8' mean?

Mistake. And while I'm commenting, I get a better idea and will revise this patch.

>
> > @@ -6168,6 +6174,9 @@ static void ufshcd_err_handler(struct Scsi_Host
> *host)
> > }
> > }
> >
> > + if (link_broken_in_ah8)
> > + goto lock_skip_pending_xfer_clear;
> > +
> > /* Clear pending task management requests */
> > for_each_set_bit(tag, &hba->outstanding_tasks, hba->nutmrs) {
> > if (ufshcd_clear_tm_cmd(hba, tag)) {
>
> Why is skipping the ufshcd_clear_tm_cmd() calls useful in this case?
>
> Thanks,
>
> Bart.
>