Re: configfs/sysfs

From: Ingo Molnar
Date: Wed Aug 19 2009 - 16:48:50 EST



* Avi Kivity <avi@xxxxxxxxxx> wrote:

> You may argue, correctly, that syscalls and ioctls are
> not as flexible. But this is because no one has
> invested the effort in making them so. A struct passed
> as an argument to a syscall is not extensible. But if
> you pass the size of the structure, and also a bitmap
> of which attributes are present, you gain extensibility
> and retain the atomicity property of a syscall
> interface. I don't think a lot of effort is needed to
> make an extensible syscall interface just as usable and
> a lot more efficient than configfs/sysfs. It should
> also be simple to bolt a fuse interface on top to
> expose it to us commandline types.

FYI, an example of such a syscall design and
implementation has been merged upstream in the .31 merge
window, see:

kernel/perf_counter.c::sys_perf_counter_open()

SYSCALL_DEFINE5(perf_counter_open,
struct perf_counter_attr __user *, attr_uptr,
pid_t, pid, int, cpu, int, group_fd, unsigned long, flags)

We embedd a '.size' field in struct perf_counter_attr. We
copy the attribute from user-space in an
'auto-extend-to-zero' way:

ret = perf_copy_attr(attr_uptr, &attr);
if (ret)
return ret;

where perf_copy_attr() extends the possibly-smaller
user-space structure to the in-kernel structure and
zeroes out remaining fields.

This means that older binaries can pass in older
(smaller) versions of the structure.

This syscall ABI design works very well and has a lot of
advantages:

- is extensible in a flexible way

- it is forwards ABI compatible

- the kernel is backwards compatible with applications

- extensions to the ABI dont uglify the interface.

- new applications can fall back gracefully to older ABI
versions if they so choose. (the kernel will reject
overlarge attr.size) So full forwards and backwards
compatibility can be implemented, if an app wants to.

- 'same version' ABI uses dont have any interface quirk
or performance penalty. (i.e. there's no increasingly
complex maze of add-on ABI details for the syscall to
multiplex through)

- the system call stays nice and readable

We've made use of this property of the perfcounters ABI
and extended it in a compatible way several times
already, with great success.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/