Re: [PATCH v3 08/12] selftests/nolibc: allow quit qemu-system when poweroff fails

From: Thomas Weißschuh
Date: Sat Jul 29 2023 - 04:00:02 EST


On 2023-07-28 04:30:31+0800, Zhangjin Wu wrote:
> The kernel of some architectures can not poweroff qemu-system normally,
> especially for tinyconfig.
>
> Some architectures may have no kernel poweroff support, the others may
> require more kernel config options and therefore slow down the
> tinyconfig build and test. and also, it's very hard (and some even not
> possible) to find out the exact poweroff related kernel config options
> for every architecture.
>
> Since the low-level poweroff support is heavily kernel & qemu dependent,
> it is not that critical to both nolibc and nolibc-test, let's simply
> ignore the poweroff required kernel config options for tinyconfig (and
> even for defconfig) and quit qemu-system after a specified timeout or
> with an expected system halt or poweroff string (these strings mean our
> reboot() library routine is perfectly ok).
>
> QEMU_TIMEOUT can be configured for every architecture based on their
> time cost requirement of bios boot + kernel boot + test + poweroff.
>
> By default, 10 seconds timeout is configured, this is enough for most of
> the architectures, otherwise, customize one by architecture.
>
> To tell users the test running progress in time, some critical running
> status are also printed and detected.
>
> Suggested-by: Willy Tarreau <w@xxxxxx>
> Link: https://lore.kernel.org/lkml/20230722130248.GK17311@xxxxxx/
> Signed-off-by: Zhangjin Wu <falcon@xxxxxxxxxxx>
> ---
> tools/testing/selftests/nolibc/Makefile | 30 +++++++++++++++++++++++--
> 1 file changed, 28 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/nolibc/Makefile b/tools/testing/selftests/nolibc/Makefile
> index a214745e0f3e..9a57de3b283c 100644
> --- a/tools/testing/selftests/nolibc/Makefile
> +++ b/tools/testing/selftests/nolibc/Makefile
> @@ -105,6 +105,9 @@ QEMU_ARGS_s390 = -M s390-ccw-virtio -m 1G -append "console=ttyS0 panic=-1
> QEMU_ARGS_loongarch = -M virt -append "console=ttyS0,115200 panic=-1 $(TEST:%=NOLIBC_TEST=%)"
> QEMU_ARGS = $(QEMU_ARGS_$(XARCH)) $(QEMU_ARGS_EXTRA)
>
> +# QEMU_TIMEOUT: some architectures can not poweroff normally, especially for tinyconfig
> +QEMU_TIMEOUT = $(or $(QEMU_TIMEOUT_$(XARCH)),10)
> +
> # OUTPUT is only set when run from the main makefile, otherwise
> # it defaults to this nolibc directory.
> OUTPUT ?= $(CURDIR)/
> @@ -229,16 +232,39 @@ kernel: $(KERNEL_CONFIG)
> # common macros for qemu run/rerun targets
> QEMU_SYSTEM_RUN = qemu-system-$(QEMU_ARCH) -display none -no-reboot -kernel "$(KERNEL_IMAGE)" -serial stdio $(QEMU_ARGS)
>
> +TIMEOUT_CMD = t=$(QEMU_TIMEOUT); past=0; \
> + bios_timeout=$$(expr $$t - 7); kernel_timeout=$$(expr $$t - 5); init_timeout=$$(expr $$t - 3); test_timeout=$$(expr $$t - 1); \
> + err=""; bios=0; kernel=0; init=0; test=0; poweredoff=0; panic=0; \
> + echo "Running $(KERNEL_IMAGE) on qemu-system-$(QEMU_ARCH)"; \
> + while [ $$t -gt 0 ]; do \
> + sleep 2; t=$$(expr $$t - 2); past=$$(expr $$past + 2); \
> + if [ $$bios -eq 0 ] && grep -E "Linux version|Kernel command line|printk: console" "$(RUN_OUT)"; then bios=1; fi; \
> + if [ $$bios -eq 1 -a $$kernel -eq 0 ] && grep -E "Run .* as init process" "$(RUN_OUT)"; then kernel=1; fi; \
> + if [ $$kernel -eq 1 -a $$init -eq 0 ] && grep -E "Running test" "$(RUN_OUT)"; then init=1; fi; \
> + if [ $$init -eq 1 -a $$test -eq 0 ] && grep -E "Leaving init with final status|Exiting with status" "$(RUN_OUT)"; then test=1; fi; \
> + if [ $$init -eq 1 ] && grep -E "Kernel panic - not syncing: Attempted to kill init" "$(RUN_OUT)"; then err="test"; sleep 1; break; fi; \
> + if [ $$test -eq 1 ] && grep -E "reboot: System halted|reboot: Power down" "$(RUN_OUT)"; then poweredoff=1; sleep 1; break; fi; \
> + if [ $$past -gt $$bios_timeout -a $$bios -eq 0 ]; then err="bios"; break; fi; \
> + if [ $$past -gt $$kernel_timeout -a $$kernel -eq 0 ]; then err="kernel"; break; fi; \
> + if [ $$past -gt $$init_timeout -a $$init -eq 0 ]; then err="init"; break; fi; \
> + if [ $$past -gt $$test_timeout -a $$test -eq 0 ]; then err="test"; break; fi; \
> + done; \
> + if [ -z "$$err" -a $$poweredoff -eq 0 -a $$panic -eq 0 ]; then err="qemu-system-$(QEMU_ARCH)"; fi; \
> + if [ -n "$$err" ]; then echo "$$err may timeout, test failed"; tail -10 $(RUN_OUT); else echo "powered off, test finish"; fi; \
> + pkill -15 qemu-system-$(QEMU_ARCH) || true
> +
> +TIMEOUT_QEMU_RUN = ($(QEMU_SYSTEM_RUN) > "$(RUN_OUT)" &); $(TIMEOUT_CMD)
> +

This feels fairly hacky.

Before we complicated nolibc-test to handle the no-procfs case to save a
few seconds building the kernel and now we have fairly big timeouts.
And a statemachine that relies on the specific strings emitted by the
testsuite.

I would like to get back to something more deterministic and obvious,
even at the cost of some time spent compiling the test kernels.
(saying this as somebody developing on a 2016 ultrabook)

"Since the low-level poweroff support is heavily kernel & qemu dependent"

The kernel we can control.

How common are qemus with that are missing poweroff support?
As this worked before I guess the only architecture where this could
pose a problem would be ppc.


An alternative I would like to put up for discussion:

qemu could provide a watchdog device that is pinged by nolibc-test for
each testcase.
After nolibc-test is done and didn't poweroff properly the watchdog will
reset the machine. ( -watchog-action poweroff ).

The disadvantages are that we would need to add watchdog drivers to the
kernels and figure out the correct watchdog devices and drivers for each arch.

It seems virtio-watchdog is not yet usable.

> # run the tests after building the kernel
> PHONY += $(KERNEL_IMAGE)
> $(KERNEL_IMAGE): kernel
> run: $(KERNEL_IMAGE)
> - $(Q)$(QEMU_SYSTEM_RUN) > "$(RUN_OUT)"
> + $(Q)$(TIMEOUT_QEMU_RUN)
> $(Q)$(REPORT) "$(RUN_OUT)"
>
> # re-run the tests from an existing kernel
> rerun:
> - $(Q)$(QEMU_SYSTEM_RUN) > "$(RUN_OUT)"
> + $(Q)$(TIMEOUT_QEMU_RUN)
> $(Q)$(REPORT) "$(RUN_OUT)"
>
> # report with existing test log
> --
> 2.25.1
>