Re: what trees/branches to test on syzbot

From: Dmitry Vyukov
Date: Sun Jun 10 2018 - 02:11:46 EST


On Sun, Jun 10, 2018 at 3:51 AM, Theodore Y. Ts'o <tytso@xxxxxxx> wrote:
> On Sat, Jun 09, 2018 at 03:17:21PM -0700, Linus Torvalds wrote:
>> I think it would be lovely to get linux-next back eventually, but it
>> sounds like it's just too noisy right now, and yes, we should have a
>> baseline for the standard tree first.
>>
>> But once there's a "this is known for the baseline", I think adding
>> linux-next back in and then maybe even have linux-next simply just
>> kick out trees that cause problems would be a good idea.
>>
>> Right now linux-next only kicks things out based on build issues (or
>> extreme merge issues), afaik. But it *would* be good to also have
>> things like syzbot do quality control on linux-next.
>
> Syzbot is always getting improved to find new classes of problems. So
> the only way to get a baseline would be to use an older version of
> syzbot for linux-next, and to have it suppress sending e-mails about
> failures that are duplicates that were already found via the mainline
> tree.
>
> Then periodically, once version N has run for M weeks, and has spewed
> some large number of new failures to LKML, then you could promote
> version N to be run against linux-next, and so hopefully the only
> thing it would report against linux-next are regressions, and not
> duplicates of new bugs also being found via the latest and greatest
> version of syzbot being run against the mainline kernel.

The set of trees where a crash happened is visible on dashboard, so
one can see if it's only linux-next or whole set of trees. Potentially
syzbot can act differently depending on this predicate, but I don't
see what should be the difference. However, this does not fully save
from falsely assessing bugs as linux-next-only just because they
happened few times and only on linux-next so far. But using an older
syzkaller revision won't save from this fully either, because (1) some
bugs take long time to find, and (2) a bug can be hidden by another
known bug, so when the second bug is fixed the first one suddenly pops
up, but it's not a new bug (and the chances are that the second one
will be fixed on linux-next first, so the first bug will look like
linux-next-only).
I think re removing commits from linux-next, one of the main signals
can be: were there recent changes related to the bug. Looking at new
bugs being reported, frequently it's quite obvious (e.g.
"use-after-free in foo" and a recent "make foo faster").
But in general, if we go with linux-next, maintainers and developers
need to agree to deal with this additional aspect during bug triage.

There is also a problem with rebasing of linux-next: reported commit
hashes do not make sense and we can forget about bisection.

On a related note, recently Greg suggested to onboard more subsystem
-next trees (currently we test only net-next and bpf-next), so I tried
to formulate requirements for these trees:

https://github.com/google/syzkaller/issues/592
- not rebased (commit hashes work, bisection works)
- maintained in a reasonably good shape (no tons of assorted crashes)
- reasonably active (makes sense to test)
- merge upstream periodically (bugs are getting fixed)
- with maintainers who are willing to cooperate and fix bugs

Any volunteers?

Thanks