Re: [PATCH] kunit: tool: yield output from run_kernel in real time

From: David Gow
Date: Mon Oct 04 2021 - 21:08:55 EST


On Tue, Oct 5, 2021 at 8:36 AM Daniel Latypov <dlatypov@xxxxxxxxxx> wrote:
>
> On Mon, Oct 4, 2021 at 4:58 PM David Gow <davidgow@xxxxxxxxxx> wrote:
> >
> > On Tue, Oct 5, 2021 at 7:46 AM Daniel Latypov <dlatypov@xxxxxxxxxx> wrote:
> > >
> > > On Mon, Oct 4, 2021 at 4:34 PM 'David Gow' via KUnit Development
> > > <kunit-dev@xxxxxxxxxxxxxxxx> wrote:
> > > >
> > > > On Tue, Oct 5, 2021 at 1:01 AM Daniel Latypov <dlatypov@xxxxxxxxxx> wrote:
> > > > >
> > > > > Currently, `run_kernel()` dumps all the kernel output to a file
> > > > > (.kunit/test.log) and then opens the file and yields it to callers.
> > > > > This made it easier to respect the requested timeout, if any.
> > > > >
> > > > > But it means that we can't yield the results in real time, either to the
> > > > > parser or to stdout (if --raw_output is set).
> > > > >
> > > > > This change spins up a background thread to enforce the timeout, which
> > > > > allows us to yield the kernel output in real time, while also copying it
> > > > > to the .kunit/test.log file.
> > > > > It's also careful to ensure that the .kunit/test.log file is complete,
> > > > > even in the kunit_parser throws an exception/otherwise doesn't consume
> > > > > every line, see the new `finally` block and unit test.
> > > > >
> > > > > For example:
> > > > >
> > > > > $ ./tools/testing/kunit/kunit.py run --arch=x86_64 --raw_output
> > > > > <configure + build steps>
> > > > > ...
> > > > > <can now see output from QEMU in real time>
> > > > >
> > > > > This does not currently have a visible effect when --raw_output is not
> > > > > passed, as kunit_parser.py currently only outputs everything at the end.
> > > > > But that could change, and this patch is a necessary step towards
> > > > > showing parsed test results in real time.
> > > > >
> > > > > Signed-off-by: Daniel Latypov <dlatypov@xxxxxxxxxx>
> > > > > ---
> > > > > tools/testing/kunit/kunit_kernel.py | 73 +++++++++++++++-----------
> > > > > tools/testing/kunit/kunit_tool_test.py | 17 ++++++
> > > > > 2 files changed, 60 insertions(+), 30 deletions(-)
> > > > >
> > > > > diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py
> > > > > index 2c6f916ccbaf..b8cba8123aa3 100644
> > > > > --- a/tools/testing/kunit/kunit_kernel.py
> > > > > +++ b/tools/testing/kunit/kunit_kernel.py
> > > > > @@ -12,7 +12,8 @@ import subprocess
> > > > > import os
> > > > > import shutil
> > > > > import signal
> > > > > -from typing import Iterator, Optional, Tuple
> > > > > +import threading
> > > > > +from typing import Iterator, List, Optional, Tuple
> > > > >
> > > > > from contextlib import ExitStack
> > > > >
> > > > > @@ -103,8 +104,8 @@ class LinuxSourceTreeOperations(object):
> > > > > if stderr: # likely only due to build warnings
> > > > > print(stderr.decode())
> > > > >
> > > > > - def run(self, params, timeout, build_dir, outfile) -> None:
> > > > > - pass
> > > > > + def start(self, params: List[str], build_dir: str) -> subprocess.Popen:
> > > > > + raise RuntimeError('not implemented!')
> > > > >
> > > > >
> > > > > class LinuxSourceTreeOperationsQemu(LinuxSourceTreeOperations):
> > > > > @@ -123,7 +124,7 @@ class LinuxSourceTreeOperationsQemu(LinuxSourceTreeOperations):
> > > > > kconfig.parse_from_string(self._kconfig)
> > > > > base_kunitconfig.merge_in_entries(kconfig)
> > > > >
> > > > > - def run(self, params, timeout, build_dir, outfile):
> > > > > + def start(self, params: List[str], build_dir: str) -> subprocess.Popen:
> > > > > kernel_path = os.path.join(build_dir, self._kernel_path)
> > > > > qemu_command = ['qemu-system-' + self._qemu_arch,
> > > > > '-nodefaults',
> > > > > @@ -134,18 +135,10 @@ class LinuxSourceTreeOperationsQemu(LinuxSourceTreeOperations):
> > > > > '-nographic',
> > > > > '-serial stdio'] + self._extra_qemu_params
> > > > > print('Running tests with:\n$', ' '.join(qemu_command))
> > > > > - with open(outfile, 'w') as output:
> > > > > - process = subprocess.Popen(' '.join(qemu_command),
> > > > > - stdin=subprocess.PIPE,
> > > > > - stdout=output,
> > > > > - stderr=subprocess.STDOUT,
> > > > > - text=True, shell=True)
> > > > > - try:
> > > > > - process.wait(timeout=timeout)
> > > > > - except Exception as e:
> > > > > - print(e)
> > > > > - process.terminate()
> > > > > - return process
> > > > > + return subprocess.Popen(' '.join(qemu_command),
> > > > > + stdout=subprocess.PIPE,
> > > > > + stderr=subprocess.STDOUT,
> > > > > + text=True, shell=True)
> > > > >
> > > > > class LinuxSourceTreeOperationsUml(LinuxSourceTreeOperations):
> > > > > """An abstraction over command line operations performed on a source tree."""
> > > > > @@ -175,17 +168,13 @@ class LinuxSourceTreeOperationsUml(LinuxSourceTreeOperations):
> > > > > kunit_parser.print_with_timestamp(
> > > > > 'Starting Kernel with all configs takes a few minutes...')
> > > > >
> > > > > - def run(self, params, timeout, build_dir, outfile):
> > > > > + def start(self, params: List[str], build_dir: str) -> subprocess.Popen:
> > > > > """Runs the Linux UML binary. Must be named 'linux'."""
> > > > > linux_bin = get_file_path(build_dir, 'linux')
> > > > > - outfile = get_outfile_path(build_dir)
> > > > > - with open(outfile, 'w') as output:
> > > > > - process = subprocess.Popen([linux_bin] + params,
> > > > > - stdin=subprocess.PIPE,
> > > >
> > > > This breaks --raw_output under UML for me. Including the
> > > > stdin=subprocess.PIPE again seems to fix it.
> > >
> > > Can you give an example of what it does?
> > >
> > > I don't see any issues with --raw_output under UML with the patch as-is.
> > > I was mainly testing this with UML, and I must have ran it some 10s of
> > > times, so I'm a bit surprised.
> > >
> > > On an earlier version, I saw some mangling of --raw_output (\n was
> > > missing), but that went away after some revisions.
> > >
> >
> > Yeah, that's the sort of thing I'm seeing: \n being treated as just a
> > new line (without the carriage return).
> > It happens pretty consistently, though sometimes the text wraps and
> > sometimes (well, once) everything gets forced into the last column of
> > the terminal. I've not been able to get it to work at all without
> > having stdin be subprocess.PIPE.
>
> I can't repro in the shell I was using while writing this.
> Starting up a new session, I get:
> * good
> * bad x3
> * good
> * bad x3
> * good
> * bad x3
> * bad, but not as bad (phew, I thought this was cursed)
>
> .kunit/test.log contains the output with proper \n each time I checked.
>
> Doesn't seem to repro when redirected into a file:
>
> $ ./tools/testing/kunit/kunit.py exec --raw_output > /tmp/out && diff
> /tmp/out .kunit/test.log | grep -m1 '^>'
>
> Piping it through cat also seems 100% fine:
> $ ./tools/testing/kunit/kunit.py exec --raw_output | cat
>
> So having stdin be subprocess.PIPE sorta makes sense in some twisted way.
> I can also now repro that setting stdin=subprocess.PIPE seems to fix it.
>
> I don't want to put back subprocess.PIPE, but I'm not confident in
> being able to find a better solution...

So it turns out that there are a few issues with UML assuming stdin ==
stdout (or at least that stdin and stdout are related), as seen in
this ongoing thread on linux-um:
http://lists.infradead.org/pipermail/linux-um/2021-October/001805.html

Some of the coments in UML's chan_user.c seem pretty related, too, in
that termios is used to set this weird
"mostly-raw-but-\n-includes-carriage-returns" mode:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/um/drivers/chan_user.c#n90

My theory is that UML sets "raw" mode based on what stdin is, then
only sets OPOST if stdout is a tty, so there's a mismatch here. So,
probably this is fundamentally a UML bug, but worth us working around
in it kunit_tool for the time being. That being said, I've not been
able to find anything obviously stdin-y setting raw mode thus far, so
I could be wrong.

My preference is that we put stdin=subprocess.PIPE back for now, since
we don't want a regression, and if a cause in UML is found and fixed,
we can remove it after the fix has propagated comfortably everywhere.

-- David

>
> >
> > It occurs both under tmux and not, and under Konsole and xterm, so it
> > doesn't appear to be specific to any given terminal implementation.
> > Still occurs even after running 'reset', and with a clean build.
> > QEMU-based --raw_output works fine.
> >
> >