Re: I.1 - System calls - ioctl

From: Ingo Molnar
Date: Mon Jun 22 2009 - 09:56:57 EST



* Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:

> On Mon, Jun 22, 2009 at 01:49:31PM +0200, Ingo Molnar wrote:
> > > How do you justify your usage of ioctl() in this context?
> >
> > We can certainly do a separate sys_perf_counter_ctrl() syscall -
> > and we will do that if people think the extra syscall slot is
> > worth it in this case.
> >
> > The (mild) counter-argument so far was that the current ioctls
> > are very simple over "IO" attributes of counters:
> >
> > - enable
> > - disable
> > - reset
> > - refresh
> > - set-period
> >
> > So they could be considered 'IO controls' in the classic sense
> > and act as a (mild) exception to the 'dont use ioctls' rule.
> >
> > They are not some weird tacked-on syscall functionality - they
> > modify the IO properties of counters: on/off, value and rate. If
> > they go beyond that we'll put it all into a separate syscall and
> > deprecate the ioctl (which will have a relatively short
> > half-time due to the tools being hosted in the kernel repo).
> >
> > This could happen right now in fact, if people think it's worth it.
>
> Yet another multiplexer doesn't buy as anything over ioctls unless
> it adds more structure.
> PERF_COUNTER_IOC_ENABLE/PERF_COUNTER_IOC_DISABLE/
> PERF_COUNTER_IOC_RESET are calls without any argument, so it's
> kinda impossible to add more structure. perf_counter_refresh has
> an integer argument, and perf_counter_period aswell (with a
> slightly more complicated calling convention due to passing a
> pointer to the 64bit integer). I don't see how moving this to
> syscalls would improve things.

Yes - this is what kept us from moving it until now. But we are
ready and willing to add a sys_perf_counter_chattr() syscall to
change attributes. We are in the 'avoid ioctls' camp, but we are
not dogmatic about that. As you seem to agree this seems to be one
of the narrow special cases where ioctls still make sense.

There _is_ another, more theoretical argument in favor of
sys_perf_counter_chattr(): it is quite conceivable that as usage of
perfcounters expands we want to change more and more attributes. So
even though right now the ioctl just about manages to serve this
role, it would be more future-proof to use sys_perf_counter_chattr()
and deprecate the ioctl() straight away - to not even leave a
_chance_ for some ioctl crap to seep into the API.

So ... we are on two minds about this, and if people dont mind a
second syscall entry, we are glad to add it.

> But talking about syscalls the sys_perf_counter_open prototype is
> really ugly - it uses either the pid or cpu argument which is a
> pretty clear indicator it should actually be two sys calls.
>
> Incomplete patch without touching the actuall wire-up below to
> demonstrate it:

Dunno, not sure i agree here. 'CPU ID' is a pretty natural expansion
of the usage of 'scope of counter'. The scope can be:

- task-self
- specific PID
- specific PID with inheritance
- specific CPU

It is not really 'multiplexing' completely unrelated things: a CPU
is 'all PIDs running on a specific CPU' specifier. It is providing
an expanded definition of 'target context to be monitored'. Just
like signals have PID, group-PID and controlling-terminal type of
extensions. We dont really have syscalls for each of those either.

Also note that the syscall does not have different meanings
depending on whether it's a CPU counter or a specific-task counter
or a task-and-all-child-tasks counter. So it is not the ugly kind of
multiplexing that makes most ioctls such a jumbled mess.

If we were to unroll and expand _all_ such things in our current
300+ syscalls in the kernel we'd have thousands of syscalls. Do we
want that? Dunno. No strong feelings - but i dont think our current
syscalls are unclean, and i dont think we should insist on a model
that would have resulted in so many syscalls, were this enforced
from the get go.

Furthermore, having a 'target ID' concept does make it harder to
expand the range of targets that we might want to monitor. Do we
want to expand the PID one with a PID-group notion perhaps? Or do we
want to add IDs for specific VMs perhaps? It does not change the
semantics, it only changes the (pretty orthogonal and isolated)
'target context' field's meaning.

Hope this helps,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/