RE: [PATCH v4 1/4] Tools: hv: Reopen the devices if read() or write() returns errors

From: Michael Kelley
Date: Sun Jan 26 2020 - 01:03:04 EST


From: Dexuan Cui <decui@xxxxxxxxxxxxx> Sent: Saturday, January 25, 2020 9:50 PM
>
> The state machine in the hv_utils driver can run out of order in some
> corner cases, e.g. if the kvp daemon doesn't call write() fast enough
> due to some reason, kvp_timeout_func() can run first and move the state
> to HVUTIL_READY; next, when kvp_on_msg() is called it returns -EINVAL
> since kvp_transaction.state is smaller than HVUTIL_USERSPACE_REQ; later,
> the daemon's write() gets an error -EINVAL, and the daemon will exit().
>
> We can reproduce the issue by sending a SIGSTOP signal to the daemon, wait
> for 1 minute, and send a SIGCONT signal to the daemon: the daemon will
> exit() quickly.
>
> We can fix the issue by forcing a reset of the device (which means the
> daemon can close() and open() the device again) and doing extra necessary
> clean-up.
>
> Signed-off-by: Dexuan Cui <decui@xxxxxxxxxxxxx>
>
> ---
> Changes in v2:
> This is actually a new patch that makes the daemons more robust.
>
> Changes in v3 (I addressed Michael's comments):
> Don't reset target_fd, since that's unnecessary.
> Reset target_fname by: target_fname[0] = '\0';
> Added the missing "fs_frozen = true;" in vss_operate().
> Just after reopen_vss_fd: if vss_operate(VSS_OP_THAW) can not clear
> fs_frozen due to an error, we just exit().
> Added comments.
>
> Changes in v4 (Thanks to Michael!):
> Added the omitted "int fcopy_fd = -1" and
> "
> if (fcopy_fd != -1)
> close(fcopy_fd);
> "
>
> tools/hv/hv_fcopy_daemon.c | 37 ++++++++++++++++++++++++----
> tools/hv/hv_kvp_daemon.c | 36 ++++++++++++++++------------
> tools/hv/hv_vss_daemon.c | 49 +++++++++++++++++++++++++++++---------
> 3 files changed, 91 insertions(+), 31 deletions(-)
>

Reviewed-by: Michael Kelley <mikelley@xxxxxxxxxxxxx>