[tip: core/rcu] torture: Make kvm-remote.sh account for network failure in pathname checks

From: tip-bot2 for Paul E. McKenney
Date: Wed Jun 30 2021 - 09:49:34 EST


The following commit has been merged into the core/rcu branch of tip:

Commit-ID: c43d3b0083b4f2e9b14174a5857ab06cbca986df
Gitweb: https://git.kernel.org/tip/c43d3b0083b4f2e9b14174a5857ab06cbca986df
Author: Paul E. McKenney <paulmck@xxxxxxxxxx>
AuthorDate: Tue, 27 Apr 2021 09:56:42 -07:00
Committer: Paul E. McKenney <paulmck@xxxxxxxxxx>
CommitterDate: Mon, 10 May 2021 16:05:07 -07:00

torture: Make kvm-remote.sh account for network failure in pathname checks

In a long-duration kvm-remote.sh run, almost all of the remote accesses will
be simple file-existence checks. These are thus the most likely to be caught
out by network failures, which do happen from time to time.

This commit therefore takes a first step towards tolerating temporary
network outages by making the file-existence checks repeat in the face of
such an outage. They also print a message every minute during a outage,
allowing the user to take appropriate action.

Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
---
tools/testing/selftests/rcutorture/bin/kvm-remote.sh | 26 ++++++++++-
1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/rcutorture/bin/kvm-remote.sh b/tools/testing/selftests/rcutorture/bin/kvm-remote.sh
index f08d415..20e848d 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm-remote.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm-remote.sh
@@ -159,6 +159,28 @@ do
fi
done

+# Function to check for presence of a file on the specified system.
+# Complain if the system cannot be reached, and retry after a wait.
+# Currently just waits forever if a machine disappears.
+#
+# Usage: checkremotefile system pathname
+checkremotefile () {
+ local ret
+ local sleeptime=60
+
+ while :
+ do
+ ssh $1 "test -f \"$2\""
+ ret=$?
+ if test "$ret" -ne 255
+ then
+ return $ret
+ fi
+ echo " ---" ssh failure to $1 checking for file $2, retry after $sleeptime seconds. `date`
+ sleep $sleeptime
+ done
+}
+
# Function to start batches on idle remote $systems
#
# Usage: startbatches curbatch nbatches
@@ -178,7 +200,7 @@ startbatches () {
echo $((nbatches + 1))
return 0
fi
- if ssh "$i" "test -f \"$resdir/$ds/remote.run\"" 1>&2
+ if checkremotefile "$i" "$resdir/$ds/remote.run" 1>&2
then
continue # System still running last test, skip.
fi
@@ -216,7 +238,7 @@ echo All batches started. `date`
# Wait for all remaining scenarios to complete and collect results.
for i in $systems
do
- while ssh "$i" "test -f \"$resdir/$ds/remote.run\""
+ while checkremotefile "$i" "$resdir/$ds/remote.run"
do
sleep 30
done