RE: RFC - kernel selftest result documentation (KTAP)

From: Bird, Tim
Date: Mon Jun 15 2020 - 13:45:35 EST




> -----Original Message-----
> From: Kees Cook
> Sent: Sunday, June 14, 2020 12:18 PM
> To: David Gow <davidgow@xxxxxxxxxx>
> Cc: Vitor Massaru Iha <vitor@xxxxxxxxxxx>; KUnit Development <kunit-dev@xxxxxxxxxxxxxxxx>; Shuah Khan
> <skhan@xxxxxxxxxxxxxxxxxxx>; open list:KERNEL SELFTEST FRAMEWORK <linux-kselftest@xxxxxxxxxxxxxxx>; Linux Kernel Mailing List
> <linux-kernel@xxxxxxxxxxxxxxx>; Brendan Higgins <brendanhiggins@xxxxxxxxxx>; linux-kernel-mentees@xxxxxxxxxxxxxxxxxxxxxxxxx;
> linux@xxxxxxxxxxxxxxxxxx
> Subject: Re: RFC - kernel selftest result documentation (KTAP)
>
> On Sat, Jun 13, 2020 at 02:51:17PM +0800, David Gow wrote:
> > On Sat, Jun 13, 2020 at 6:36 AM Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> > > Regarding output:
> > >
> > > [ 36.611358] TAP version 14
> > > [ 36.611953] # Subtest: overflow
> > > [ 36.611954] 1..3
> > > ...
> > > [ 36.622914] # overflow_calculation_test: s64: 21 arithmetic tests
> > > [ 36.624020] ok 1 - overflow_calculation_test
> > > ...
> > > [ 36.731096] # overflow_shift_test: ok: (s64)(0 << 63) == 0
> > > [ 36.731840] ok 2 - overflow_shift_test
> > > ...
> > > [ 36.750294] kunit_try_catch: vmalloc: allocation failure: 18446744073709551615 bytes, mode:0xcc0(GFP_KERNEL),
> nodemask=(null),cpuset=/,mems_allowed=0
> > > ...
> > > [ 36.805350] # overflow_allocation_test: devm_kzalloc detected saturation
> > > [ 36.807763] ok 3 - overflow_allocation_test
> > > [ 36.807765] ok 1 - overflow
> > >
> > > A few things here....
> >
> > Tim Bird has just sent out an RFC for a "KTAP" specification, which
> > we'll hope to support in KUnit:
>
> Ah-ha! Thanks for the heads-up. :)
>
> > https://lore.kernel.org/linux-
> kselftest/CY4PR13MB1175B804E31E502221BC8163FD830@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/T/#u
>
> *thread split/merge*
>
> This is coming from:
> https://lore.kernel.org/linux-kselftest/CABVgOSnofuJQ_fiCL-8KdKezg3Hnqk3A+X509c4YP_toKeBVBg@xxxxxxxxxxxxxx/
> But I'm attempting a thread jump... ;)
>
> > That's probably where we'll end up trying to hash out exactly what
> > this format should be. Fortunately, I don't think any of these will
> > require any per-test work, just changes to the KUnit implementation.
>
> Yup, good.
>
> > > - On the outer test report, there is no "plan" line (I was expecting
> > > "1..1"). Technically it's optional, but it seems like the information
> > > is available. :)
> >
> > There's work underway to support this, but it's hit a few minor snags:
> > https://lkml.org/lkml/2020/2/27/2155
>
> Okay, cool. It's not critical, I don't think.
>
> > > - The subtest should have its own "TAP version 14" line, and it should
> > > be using the diagnostic line prefix for the top-level test (this is
> > > what kselftest is doing).
> >
> > Alas, TAP itself hasn't standardised subtests. Personally, I think it
> > doesn't fundamentally matter which way we do this (I actually prefer
> > the way we're doing it currently slightly), but converging with what
> > kselftest does would be ideal.
>
> I see the KTAP RFC doesn't discuss subtests at all, but kselftest actually
> already handles subtests. I strongly feel that line-start formatting is
> the correct way to deal with this, with each subtest having it's own
> self-contained KTAP. This allows for several important features:
>
> - the subtest, run on its own, needs no knowledge about its execution
> environment: it simply emits its standard KTAP output.
>
> - subtest output can be externally parsed separably, without any
> knowledge or parsing of the enclosing test.

I agree with both of these, but will save additional comments for the
thread on my KTAP doc RFC.

>
> For example, with my recent series[1], "make -C seccomp run_tests"
> produces:
>
> TAP version 13
> 1..2
> # selftests: seccomp: seccomp_bpf
> # TAP version 13
> # 1..77
> # # Starting 77 tests from 6 test cases.
> # # RUN global.mode_strict_support ...
> # # OK global.mode_strict_support
> # ok 1 global.mode_strict_support
> ...
> # ok 77 TSYNC.two_siblings_not_under_filter
> # # FAILED: 73 / 77 tests passed.
> # # Totals: pass:72 fail:4 xfail:1 xpass:0 skip:0 error:0
> not ok 1 selftests: seccomp: seccomp_bpf # exit=1
> # selftests: seccomp: seccomp_benchmark
> #
> not ok 2 selftests: seccomp: seccomp_benchmark # TIMEOUT
>
> > > - There is no way to distinguish top-level TAP output from kernel log
> > > lines. I think we should stick with the existing marker, which is
> > > "# ", so that kernel output has no way to be interpreted as TAP
> > > details -- unless it intentionally starts adding "#"s. ;)
> >
> > At the moment, we're doing this in KUnit tool by stripping anything
> > before "TAP version 14" (e.g., the timestamp), and then only incuding
> > lines which parse correctly (are a test plan, result, or a diagnostic
> > line beginning with '#').
> > This has worked pretty well thus far, and we do have the ability to
> > get results from debugfs instead of the kernel log, which won't have
> > the same problems.
> >
> > It's worth considering, though, particularly since our parser should
> > handle this anyway without any changes.
>
> For the KTAP parsing, actually think it's very important to include
> kernel log lines in the test output (as diagnostic lines), since they
> are "unexpected" (they fail to have the correct indentation) and may
> provide additional context for test failures when reading log files.
> (As in the "vmalloc: allocation failure" line in the quoted section
> above, to be included as a diagnostic line for test #3.)
>
> > > - There is no summary line (to help humans). For example, the kselftest
> > > API produces a final pass/fail report.
> >
> > Currently, we're relying on the kunit.py script to produce this, but
> > it shouldn't be impossible to add to the kernel, particularly once the
> > "KUnit Executor" changes mentioned above land.
>
> Cool. Yeah, it's not required, but I think there are two use cases:
> humans running a single test (where is a summary is valuable, especially
> for long tests that scroll off the screen), and automation (where it can
> ignore the summary, as it will produce its own in a regularized fashion).
>
> > > Taken together, I was expecting the output to be:
> > >
> > > [ 36.611358] # TAP version 14
> > > [ 36.611953] # 1..1
> > > [ 36.611958] # # TAP version 14
> > > [ 36.611954] # # 1..3
> > > ...
> > > [ 36.622914] # # # overflow_calculation_test: s64: 21 arithmetic tests
> > > [ 36.624020] # # ok 1 - overflow_calculation_test
> > > ...
> > > [ 36.731096] # # # overflow_shift_test: ok: (s64)(0 << 63) == 0
> > > [ 36.731840] # # ok 2 - overflow_shift_test
> > > ...
> > > [ 36.750294] kunit_try_catch: vmalloc: allocation failure: 18446744073709551615 bytes, mode:0xcc0(GFP_KERNEL),
> nodemask=(null),cpuset=/,mems_allowed=0
> > > ...
> > > [ 36.805350] # # # overflow_allocation_test: devm_kzalloc detected saturation
> > > [ 36.807763] # # ok 3 - overflow_allocation_test
> > > [ 36.807763] # # # overflow: PASS
> > > [ 36.807765] # ok 1 - overflow
> > > [ 36.807767] # # kunit: PASS
> > >
> > > But I assume there are threads on this that I've not read... :)
> >
> > These discussions all seem to be coming to a head now, so this is
> > probably just the kick we need to prioritise this a bit more. The KTAP
> > thread hasn't covered all of these (particularly the subtest stuff)
> > yet, so I confess I hadn't realised the extent of the divergence
> > between KUnit and kselftest here.
>
> Cool. Yeah, I'd like to keep things as close as possible. In looking at
> this again, I wonder if perhaps it would be better to change the
> "indent" from "# " to " ", which might make things more readable for
> both dmesg and terminal output:
>
> [ 36.611358] TAP version 14
> [ 36.611953] 1..1
> [ 36.611958] TAP version 14
> [ 36.611954] 1..3
> ...
> [ 36.622914] # overflow_calculation_test: s64: 21 arithmetic tests
> [ 36.624020] ok 1 - overflow_calculation_test
> ...
> [ 36.731096] # overflow_shift_test: ok: (s64)(0 << 63) == 0
> [ 36.731840] ok 2 - overflow_shift_test
> ...
> [ 36.750294] kunit_try_catch: vmalloc: allocation failure: 18446744073709551615 bytes, mode:0xcc0(GFP_KERNEL),
> nodemask=(null),cpuset=/,mems_allowed=0
> ...
> [ 36.805350] # overflow_allocation_test: devm_kzalloc detected saturation
> [ 36.807763] ok 3 - overflow_allocation_test
> [ 36.807763] # overflow: PASS
> [ 36.807765] ok 1 - overflow
> [ 36.807767] # kunit: PASS
>
> As it happens, subtests are a bit rare in kselftests right now, but
> for kunit, the "common" output (IIUC) will fundamentally be a top-level
> test running all the subtests, so we should get it right. :)

Personally, as a human I find the space prefix a bit easier to read.
However, I think that in "normal" kernel log output it is unusual for
a line to be prefixed with a hash (#), so this might be easier to
both visually distinguish and for automated parsing.
So I'm torn. I don't have a strong opinion on space vs. hash prefix
for indicating sub-test. I think the KUnit convention of matching
whatever was the prefix of the "TAP version 14" line is clever, but
it would be nice to just pick a prefix and be done with it.

IMHO.
-- Tim


> -Kees
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=selftests/harness/tap
>
> --
> Kees Cook