Re: Linux needs flexible security

Jesse Pollard (
Sun, 21 Nov 1999 09:43:39 -0600 (CST)

From: Pjotr Kourzanoff <>
>On 20 Nov 1999, Mike Coleman wrote:
>> write 2, "this is data", 12
>> (or perhaps a packed version for efficiency). The tracer could then react by
>> writing a line like
>> skip-fail EIO
>> or
>> skip-ok 12
>> to make the call "fail" or "succeed" (returning 12) without even really being
>> seen by the kernel. Alternatively, the tracer could write
>> write 4, "this is data", 12
> I think this becomes very tricky very soon. Here you need to access
> the same request queue from both kernel and the tracer. And if you
> allow for multiple tracers you may get infinite recursion! The
> way to eliviate this is to have something like syscall batch
> queue: all level 2 (safe:untrusted) sys_ entries just append the requests in
> there, tracers then read and put replies into another queue which is
> served by level 1 (unsafe:trusted) sys_ entries which do the real work.
> This redirection can obviously done from syscall table even at runtime,
> if necessary.

You must also number the events. The illustration does not have enough
information to make a decision -
what process issued the write
which user is this
what is the users current security permissions
what are the current permissions on the file being written to
what is the parent process (log information)
when was the write started (log information)
when was it approved/disapproved (log information)
which event is being approved/disapproved

For some security models you must also have access to information on
what other files may be open.

>> [*]
>> You would also want to be able to step on memory before returning (e.g., to
>> alter the result of gettimeofday). It would be nice if the protocol would
> Do you have some programs that require this?
>> provide commands like
>> set 1 "some data"
>> set 2 "other data"
>> but this might be problematic. Just using /proc/<n>/mem is also a
>> possibility.
> Now, do you want to change the tracee's memory?
>> Similar commands would be provided for intercepting and handling signals.
> Some would even say that it can capture an interrupt.
>> The beauty of an approach like this is that it could be relatively generic.
>> Right now, in order to do this sort of thing with ptrace, you have to know
>> endless ugly details about exactly how to parse the stack, etc. This device,
>> though, could have a generic protocol.

Generic maybe - but not complete. This is a technique more suitable to debugging
not security.

> It actually has one :-) If you mean the format of the data, then I
> would disagree. You do design this for a specific purpose - flexible
> security, why not optimise for that?
>> It's also fairly powerful. I think you could (portably) implement tools like
>> strace or MEC's trace-and-replay debugger or the security tool we're
>> discussing using just this interface. And by eliminating ptrace, you
>> eliminate its problems (altered parent/child semantics).
> Indeed. read/write rules doesn't it ?

Nope: ioctls are also used to read/write data (see the CD writer/reader for

>> This device could also be reentrant, able to be opened by more than one
>> tracer. Then each call would possibly be screened/altered by several
>> processes, in a nested fashion. This would be difficult or impossible with
>> ptrace.
> Yeah, sure. Tracer just reads from the queue, does its job and
> writes to another queue. This gives a chance to other tracers to do
> their job while this one does his, since reads can be concurrent.
> Often, tracer's actions would be to syscall (trusted one), meaning
> that actions of a group of tracers can be interleaved with the
> tracees. On SMP, this gives speedup.

nope: with multiple tracers you must syncronize the approvals. This requires
scanning the queue of not-yet-processed events to locate the event to approve
or disapprove.

>> As far as performance, though, it's not clear at all that this would be an
>> improvement over ptrace. In the SP case, you still end up suspending the
>> calling process, waking up the tracer, which is presumably doing a select on
>> the device, and then switching back to the tracee. I'm not very knowledgeable
>> about MP, but I don't see why this would be a win in the MP case either.
> OK. Two reasons:
> 1. syscall authentication requests (or whatever you want your
> "tracer" to do :-) can be batched. So, while the calling
> process is suspended, others (other than the "tracer")
> can be resumed, once they are authenticated, all by the same
> tracer and provided that the tracer is somehow faster than the
> tracees.

batched yes - but not in a timely manner. There are too many events that have
some time coordination required - see audio recording/playback, CD writing,
RPC calls.
> 2. 1. + MP = definite speedup.

speedup yes - but you end up having to dedicate a CPU to handling the tracer.

>> Another problem you run into is dealing with system call arguments. Suppose
>> the call is a write. Are the contents of the buffer to be written also to be
>> passed through this syscall device, so that the tracer can examine them? That
>> would be nice, but how to you do it efficiently?
> Voila. the same read()! for the sake simpliness: take readv(), make
> a special iovec with two elements: pointer to the syscall meta-data,
> and pointer to the syscall arguments that points to wherever you what your
> trustee's data to appear inside tracer. Pass this to the kernel, and it can
> fill syscall meta-data, and do an mmap of the arguments if that's
> necessary, or just copy_page() it. What you get is just one copy of
> the meta-data (less than 1k, right) and (as you like) one or 0 copy
> of the argument values. That's not much, isn't it?

Now you are duplicating the entire system. Consider what happens when a user
copies his directory tree to tape using tar... You copy the read... then you
copy the write..., and with the users actual request you have 3 times the
system load.

>> A naive way to do it would be to simply copy the arguments out of user space.
>> But that's really expensive if you're doing 10MB writes. You also have to
>> know exactly what data each system call will possibly read or write from user
>> memory, which MEC has previously pointed out is a rat's nest for ioctl, etc.
> ioctl's have no play here. I am not sure what are you talking here
> about. What you do is just initialize this queueing externally "once" and never
> touch anything else but read() and write().

again - ioctls do transfer data and control information (CD readers/writers)

>> Another way you could do it would be to fake the copy by having reads on the
>> syscall device transparently pull data out of memory, depending on the
>> semantics of the particular system call, values of other arguments, etc. This
>> still has the ioctl problem, though, and if you don't copy the data, you have
>> to worry about locking it down (making sure it doesn't get overwritten) if
>> shared memory is involved.
> Right. You know what you want to trace. From that function you can
> give a hint to the queue implementation what should be copied and
> what should be mapped. Immediately after the
> tracer gets to the point of read()'ing its syscall, you can use this in your
> do_read() dual and copy the data, as requested. What's a ioctl
> problem? In some cases you would like to get a snapshot of the
> tracee's activities, to gather some insight in it's guts, and you're
> exactly interested what other guys doing with shared memory while
> you control it.
> In case you mean that it will be a nightmare to
> configure the queue implementation at run-time, for each separate
> syscall...that's right. But I remember someone proposed a
> Interceptor patch to this list (couple of months ago), you could
> take that and add a stack of specific queue implementations. That's
> not much: one for strace-like beast (tracee readonly), one for
> the debugger (r/w) and one for security application (tracee
> readonly, multiple security levels on syscalls). Do you need more?
>> I'm not trying to discourage you. I think some variation of this is a real
>> winner, and I was planning to try my hand at it myself.
> Nice. Do you have something working then?

This is more suitable to debugging security problems with a single application
that it is a generic security module. It is too slow.
Jesse I Pollard, II

Any opinions expressed are solely my own.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
Please read the FAQ at