Re: [PATCH] seccomp: allow BPF_MOD ALU instructions

From: Kees Cook
Date: Wed Mar 18 2020 - 00:06:14 EST


On Tue, Mar 17, 2020 at 09:11:57PM -0400, Anton Protopopov wrote:
> ÐÑ, 17 ÐÐÑ. 2020 Ð. Ð 16:21, Kees Cook <keescook@xxxxxxxxxxxx>:
> >
> > On Mon, Mar 16, 2020 at 06:17:34PM -0400, Anton Protopopov wrote:
> > > and in every case to walk only a corresponding factor-list. In my case
> > > I had a list of ~40 syscall numbers and after this change filter
> > > executed in 17.25 instructions on average per syscall vs. 45
> > > instructions for the linear filter (so this removes about 30
> > > instructions penalty per every syscall). To replace "mod #4" I
> > > actually used "and #3", but this obviously doesn't work for
> > > non-power-of-two divisors. If I would use "mod 5", then it would give
> > > me about 15.5 instructions on average.
> >
> > Gotcha. My real concern is with breaking the ABI here -- using BPF_MOD
> > would mean a process couldn't run on older kernels without some tricks
> > on the seccomp side.
>
> Yes, I understood. Could you tell what would you do exactly if there
> was a real need in a new instruction?

I'd likely need to introduce some kind of way to query (and declare) the
"language version" of seccomp filters. New programs would need to
declare the language level (EINVAL would mean the program must support
the original "v1", ENOTSUPP would mean "kernel doesn't support that
level"), and the program would have to build a filter based on the
supported language features. The kernel would assume all undeclared
seccomp users were "v1" and would need to reject BPF_MOD. All programs
declaring "v2" would be allowed to use BPF_MOD.

It's really a lot for something that isn't really needed. :)

> > Since the syscall list is static for a given filter, why not arrange it
> > as a binary search? That should get even better average instructions
> > as O(log n) instead of O(n).
>
> Right, thanks! This saves about 4 more instructions for my case and
> works 1-2 ns faster.

Excellent!

> > Though frankly I've also been considering an ABI version bump for adding
> > a syscall bitmap feature: the vast majority of seccomp filters are just
> > binary yes/no across a list of syscalls. Only the special cases need
> > special handling (arg inspection, fd notification, etc). Then these
> > kinds of filters could run as O(1).

*This* feature wouldn't need my crazy language version idea, but it
_would_ still need to be detectable, much like how RET_USER_NOTIF was
added.

--
Kees Cook