Re: Formal description of system call interface

From: Carlos O'Donell
Date: Fri Apr 21 2017 - 13:38:55 EST


On 11/06/2016 05:39 PM, Dmitry Vyukov wrote:
> Hello,
>
> This is notes from the discussion we had at Linux Plumbers this week
> regarding providing a formal description of system calls (user API).
>
> The idea come up in the context of syzkaller, syscall fuzzer, which
> has descriptions for 1000+ syscalls mostly concentrating on types of
> arguments and return values. However, problems are that a small group
> of people can't write descriptions for all syscalls; can't keep them
> up-to-date and doesn't have necessary domain expertise to do correct
> descriptions in some cases.
>
> We identified a surprisingly large number of potential users for such
> descriptions:
> - fuzzers (syzkaller, trinity, iknowthis)
> - strace/syscall tracepoints (capturing indirect arguments and
> printing human-readable info)
> - generation of entry points for C libraries (glibc, liblinux
> (raw syscalls), Go runtime, clang/gcc sanitizers)

To add another:

Auto-generation of SYS_* macros (sys/syscalls.h) in glibc which are
required for syscall().

It would mean we could copy the list directly from the most recently
released kernel instead of relying on distro kernel UAPI headers package.

We need this information in the released kernel.

> - valgrind/sanitizers checking of input/output values of syscalls
> - seccomp filters (minijail, libseccomp) need to know interfaces
> to generate wrappers
> - safety certification (requires syscall specifications)
> - man pages (could provide actual syscall interface rather than
> glibc wrapper interface, it was noted that possible errno values
> is an important part here)
> - generation of syscall argument validation in kernel (fast version
> is enabled all the time, extended is optional)
>
> It's worth noting that number of these users already have some
> descriptions that suffer from the same problems of being
> incomplete/outdated. See also linux-api mailing list description
> which lists an overlapping set of cases:
> https://www.kernel.org/doc/man-pages/linux-api-ml.html
>
> We discussed several implementation approaches:
> - Extracting the interface from kernel code either by parsing
> sources or using dwarf. However, current source doesn't have
> enough info: fd are specified as int, while we need to know exact
> fd type (e.g. fd_epoll_t); not possible to extract flag set for
> 'int flags'; don't know what is 'char*'.
> - Making the formal description the master copy and generating
> kernel code from it (structs, flags, syscall entry points).
> This is quite pervasive, but otherwise should work.
> - Doing what syzkaller currently does: providing the description
> on side. Verifying that description and implementation match
> is an important piece here. We can do dynamic checking in syscall
> entry points (print warnings on anything that does not match
> descriptions); or static checking (but again kernel code doesn't
> have enough info for checking).
>
> We decided to pursue the last option as the least pervasive for now.
> Several locations for the descriptions were proposed: with source code,
> include/uapi, Documentation.
>
> Action points:
> - polish DSL for description (must be extensible)
> - write a parser for DSL
> - provide definition for mm syscalls (mm is reasonably simple
> and self-contained)
> - see if we can do validation of mm arguments

Have we made any progress on these points?

> It was acknowledged that whatever we do now it will probably
> significantly change and evolve over time as we better understand
> what we need and what works.
>
> For the reference, current syzkaller descriptions are in txt files here:
> https://github.com/google/syzkaller/tree/master/sys
> The most generic syscalls are here:
> https://github.com/google/syzkaller/blob/master/sys/sys.txt
> Specific subsystems are described in separate files, e.g.:
> https://github.com/google/syzkaller/blob/master/sys/bpf.txt
> https://github.com/google/syzkaller/blob/master/sys/tty.txt
> https://github.com/google/syzkaller/blob/master/sys/sndseq.txt
> The descriptions should be self-explanatory, but just in case there
> is also a semi-formal DSL specification here:
> https://github.com/google/syzkaller/blob/master/sys/README.md
>
> Taking the opportunity, if you see that something is missing/wrong
> in the descriptions of the subsystem you care about, or if it is not
> described at all, fixes are welcome.
>
> Thanks

--
Cheers,
Carlos.