Re: [PATCH v4 06/10] scsi: ufs: Remove host_sem used in suspend/resume

From: Can Guo
Date: Tue Jun 29 2021 - 17:50:51 EST


On 2021-06-30 02:01, Bart Van Assche wrote:
On 6/28/21 11:23 PM, Can Guo wrote:
On 2021-06-29 01:31, Bart Van Assche wrote:
On 6/28/21 1:17 AM, Can Guo wrote:
On 2021-06-25 01:11, Bart Van Assche wrote:
On 6/23/21 11:31 PM, Can Guo wrote:
Using back host_sem in suspend_prepare()/resume_complete()
won't have this problem of deadlock, right?

Although that would solve the deadlock discussed in this email
thread, it wouldn't solve the issue of potential adverse
interactions of the UFS error handler and the SCSI error
handler running concurrently.

I think I've explained it before, paste it here -

ufshcd_eh_host_reset_handler() invokes ufshcd_err_handler() and
flushes it, so SCSI error handler and UFS error handler can
safely run together.

That code path is the exception. Do you agree that the following
three functions all invoke the ufshcd_err_handler() function
asynchronously? * ufshcd_uic_pwr_ctrl() * ufshcd_check_errors() *
ufshcd_abort()

I agree, but I don't see what's wrong with that. Any context can
invoke ufs error handler asynchronously and ufs error handler prepare
makes sure error handler can work safely, i.e., stopping PM
ops/gating/scaling in error handler prepare makes sure no one shall
call ufshcd_uic_pwr_ctrl() ever again. And ufshcd_check_errors() and
ufshcd_abort() are OK to run concurrently with UFS error handler.

The current UFS error handling approach requires the following code in
ufshcd_queuecommand():

if (hba->pm_op_in_progress) {
hba->force_reset = true;
set_host_byte(cmd, DID_BAD_TARGET);
cmd->scsi_done(cmd);
goto out;
}

Removing that code is not possible with the current error handling
approach. My patch makes it possible to remove that code.

Sorry that I missed the change of scsi_transport_template() in your
previous message. I can understand that you want to invoke UFS error
hander by invoking SCSI error handler, but I didn't go that far
because I saw you changed pm_runtime_get_sync() to
pm_runtime_get_noresume() in ufs error handler prepare. How can that
change make sure that the device is not suspending or resuming while
error handler is running?

UFS power state transitions happen by submitting a SCSI command to a
WLUN. The SCSI error handler is only activated after all outstanding
SCSI commands for a SCSI host have failed or completed. I think this
guarantees for the UFS driver that eh_strategy_handler is not invoked
while a command submitted to a WLUN is changing the power state of the
UFS device. The following code from scsi_error.c only wakes up the error
handler if (shost->host_failed || shost->host_eh_scheduled) &&
shost->host_failed == scsi_host_busy(shost):

if ((shost->host_failed == 0 && shost->host_eh_scheduled == 0)
|| shost->host_failed != scsi_host_busy(shost)) {
schedule();
continue;
}
/* Handle SCSI errors */


It is not completely right - wl_suspend/resume() are much more twisted.

wl_suspend() may or may NOT send a SCSI cmd to WLUN, i.e., SSU cmd may
be skipped if spm/rpm_lvl is 0/1 and/or if bkops/wb is on-going (even when
rpm_lvl is not 0/1), while link can still be put to hibern8/off, then
power/clks can still be shutdown to save power.

wl_resume(), in case of rpm/spm_lvl == 5, does a full reset to UFS device,
without sending a SSU cmd to WLU to complete the power state transition.

So above checks (in scsi_error_handler()) cannot gaurantee that actual
power state transistions in UFS driver has ceased before start UFS error
handling.

Thanks,

Can Guo.

Thanks,

Bart.