Re: dynamic sysctl registration (pre2.0).4

Tom Dyas (tdyas@eden.rutgers.edu)
Fri, 17 May 96 23:26:43 EDT


> > The following patch adds support in the sysctl system for dynamically
> > adding and removing sysctl entries from existing sysctl tables.
>
> Umm, take another look at the code --- linux has supported this since
> 1.3.59. :)

Now that I have looked at your example in your patch #2, it became
clear how to register sysctl entries that overlap with existing
functions. However, I do not think the existing
(un)register_sysctl_table interface is all that intuitive compared to
the simpler {un}register_sysctl() interface.

> > It is fundamental redesign of how sysctl's are stored.
>
> No such redesign is necessary!!!

I think there should be a redesign for several major reasons:

> > They are now stored as linked lists much as the procfs directory
> > entries are stored.
>
> The existing code already stores them in linked lists. However, the
> way we do it is to maintain a linked list of separate entire table
> hierarchies, so that whole sets of sysctl entries may be registered
> and deregistered at once.

1) My implementation uses slightly less memory and more importantly
avoids invoking kmalloc() to allocate space for the associated
proc_dir_entry's (now embedded in sysctl_entry since there is a
one-to-one relationship between sysctl_entry's and proc_dir_entry's)
and ctl_table_header's (no longer needed).

One nitpick is that someone could not have the sysctl exported to
/proc/sys by not including a procname in the sysctl entry and thus the
embedded proc_dir_entry is just wasted space. Not a problem since
anybody too lazy to put a procname into their entry should be fed to
the Solaris TCP ack code.

2) Also, your code also needs to use a recursive function
(register_proc_table) to traverse the sysctl tables. IMHO, recursive
procdures in the kernel is broken behavior unless said function is
*truly* needed. My implementation uses no recursive functions. Even if
you did this "iteratively", you would still need a stack of
unspecified length to do the table traversal.

Calls to my {un}register_sysctl() function map directly onto single
calls to proc_{un}register. No need to go searching for overlaps and
other things. Makes the code easier to read and less prone to bugs
(You did have to patch your own code to get overlapping tables to work
under /proc/sys ...).

> > This support is needed because modules such as binfmt_java may want
> > to register sysctl's in existing tables. The current system does not
> > allow this at all.
>
> Have a look at the "register_sysctl_table" and
> "unregister_sysctl_table" functions. The register function takes a
> sysctl hierarchy as argument, and adds it to the list of available
> tables. Any sysctl() request will be matched against each table on
> the list until a match is found.
>
> The register function also inserts all of the entries on the list into
> the /proc/sys filesystem. However, there is a problem with this in
> current kernels --- we don't cleanly handle the case where a directory
> in a new table overlaps an existing directory. Patch 1 below fixes
> this, as well as fixing a memory leak when we deregister sysctl tables
> and adding some necessary auxiliary functions to the ksyms exported
> symbol table.

> Patch 2 below updates binfmt_java to add /proc/sys/kernel/java*
> entries dynamically when inserted as a module; it's a good example of
> how to use this feature.

See my comments after the patch.

> > Some minor typos in include/asm-i386/unistd.h are corrected. Just some
> > extra underscores in the syscall number macros which prevented me from
> > using _syscall1 to make a sysctl syscall function the first time
> > around.
>
> This is also wrong. The underscores are there for a very good reason.
>
> BSD-4.4 specifies the sysctl() function as
>
> int sysctl (int *name, int nlen,
> void *oldval, size_t *oldlenp,
> void *newval, size_t newlen);
>
> However, the i386 unistd.h doesn't allow us to pass more than 5
> arguments into a system call. To overcome this, we define the actual
> system call interface differently:
>
> int sys_sysctl(struct __sysctl_args *args);

This would be a misinterpretation of the syscall names on my
part. I'll go stick my head in /dev/random and hope it was replaced
with a weaker generator so I don't get my head
randomized. [ObThreadDrift: Can we truly know where /dev/random is if
it is random?]

> --- ksyms.c 1996/05/15 13:13:31 1.7
> +++ ksyms.c 1996/05/17 19:44:10
> @@ -230,6 +230,11 @@
> /* sysctl table registration */
> X(register_sysctl_table),
> X(unregister_sysctl_table),
> + X(proc_dostring),
> + X(proc_dointvec),
> + X(proc_dointvec_minmax),
> + X(sysctl_string),
> + X(sysctl_intvec),
>
> /* interrupt handling */
> X(request_irq),

The proc_* symbols need to be protected with a CONFIG_PROC_FS
proprocessor conditional. Also, shouldn't these be declared in a
symbol table local to sysctl.c and registered using register_symtab()?
Which leads to another reason:

3) register_symtab() exists to allow kernel code to decide locally
what symbols get exported. This helps by cleaning up ksyms.c. My
register_sysctl() function provides that same sort of ability for
sysctl entries. Notice how linux/kernel/sysctl.c is littered with lots
of extern declarations. When I added the bdflush parameters to the
sysctl system, I had to remove the static declaration of bdflush_prm[]
and then put in extern references in sysctl.c to make the linker
happy.

Looking at your Java example below, imagine what would happen if we
did local registration for most of the stuff. We would have to have a
"root_table" for each local registration which just wastes memory
space and does not look nice.

> ----------------------------------------------------------------
>
> Patch 2: add dynamic /proc/sys/kernel/java*
> ----------------------------------------------------------------
> Index: linux/fs/binfmt_java.c
> ===================================================================
> RCS file: /home/rcs/CVS/linux/fs/binfmt_java.c,v
> retrieving revision 1.1
> diff -u -r1.1 binfmt_java.c
> --- binfmt_java.c 1996/05/15 13:12:19 1.1
> +++ binfmt_java.c 1996/05/17 21:04:17
> @@ -19,6 +19,26 @@
> char binfmt_java_interpreter[65] = _PATH_JAVA;
> char binfmt_java_appletviewer[65] = _PATH_APPLET;
>
> +#ifdef MODULE
> +#include <linux/sysctl.h>
> +static struct ctl_table_header *java_sysctl_header;
> +static ctl_table java_root_table[];
> +static ctl_table java_kernel_table[];
> +
> +static ctl_table java_root_table[] = {
> + {CTL_KERN, "kernel", NULL, 0, 0555, java_kern_table},
> + {0}
> +};
> +
> +static ctl_table java_kern_table[] = {
> + {KERN_JAVA_INTERPRETER, "java-interpreter", binfmt_java_interpreter,
> + 64, 0644, NULL, &proc_dostring, &sysctl_string },
> + {KERN_JAVA_APPLETVIEWER, "java-appletviewer", binfmt_java_appletviewer,
> + 64, 0644, NULL, &proc_dostring, &sysctl_string },
> + {0}
> +};
> +#endif
> +

While the ability to everything in a single call to a
register_sysctl_table function is nice, why is there a need to
duplicate entries such as the CTL_KERN entry in preceding tables
(java_root_table). Here is my form of the registrations:

register_sysctl(&sysctl_kern, &(sysctl_entry)
{KERN_JAVA_INTERPRETER,"java-interpreter",binfmt_java_interpreter,
64,0644,&proc_dostring,&sysctl_string});

register_sysctl(&sysctl_kern, &(sysctl_entry)
{KERN_JAVA_APPLETVIEWER, "java-appletviewer",binfmt_java_appletviewer,
64, 0644, &proc_dostring,&sysctl-string});

And the corresponding unregistrations:

unregister_sysctl(&sysctl_kern,KERN_JAVA_INTERPRETER);
unregister_sysctl(&sysctl_kern,KERN_JAVA_APPLETVIEWER);

Tom