Re: Suggested dual human/binary interface for proc/devfs

From: Olaf Titz (olaf@bigred.inka.de)
Date: Sun Apr 09 2000 - 09:07:20 EST


[Kernel-specific stuff at the end of posting.]

> processor 0 {
> vendor_id:GenuineIntel
> cpu family:6
> model:6
> ...
> }

This kind of format has gotten rather popular lately (see INN, BIND,
DANTE...) but I doubt it fits the purpose (of all of them). The braces
and indentation stuff suggest we have a context-free language here,
and in fact parsers are commonly implemented in yacc because it's so
nice and easy, but as long as there is no real nested structure, there
is no point in this syntax. Much less using ';' as a mandatory delimiter.

In most cases the needed grouping is exactly one level. For this kind
of structure we have the Windows/Samba/KDE .INI file format:

[Processor 0]
VendorID=GenuineIntel
CPUFamily=6
...
[Processor 1]
VendorID=GenuineIntel
...

which is easier to parse and just as easy to write or understand for
humans (just make sure that it skips leading/trailing whitespace,
allows comments and keywords are case-insensitive).

For even easier parsing, consider the Java properties format:
processor.0.vendorid=GenuineIntel
processor.0.cpufamily=6
processor.0.model=6
processor.1.vendorid=GenuineIntel
processor.1.cpufamily=6
...

This is minimalistic yet universally usable, but the repetition in the
names lead to bloat. I don't exactly want to configure BIND with that
(but it would be nice if it would use an INI file).

The big pro of this format: trivial use of grep.

> Things are grouped with braces, leading white space ignored till first
> char. After first non-whitespace char everything is significant up to
> attribute:value deliminator ":" in this case. Note even the inconsistancy

If you use a C like free form, do it at least in all consequence and
make newline equivalent to any other whitespace, eliminating lines as
a structural element. They confuse more than help, IMHO. But as argued
above, this syntax isn't exactly the best one anyway.

> If the data were presented by the kernel is a format such as the above, it
> would be a simple matter to build a generic user space parser with final
> output in XML or human readable form or whatever. Simply tell the standard
> parser what you format you want. The basic problem is that there is no
> consistancy to the stuff in /proc.

Consistency is badly needed, and a fully tagged format too (s.b.)
while keeping it as simple and non-bloated as possible. This leaves
the INI and the Java format, where I tend towards the former although
it is ugly - but it is both easy to parse and generate.
(Singling out a section: sed -n '/^\[section\]/,/^\[/p')

> Have a look at /proc/net/dev for a completely different format. It's gotta
> suck but I imagine a lot of programs rely on these weird formats.

If you want to utilize the format of /proc/net/dev etc. fully -
_using_ the header - the parser gets really complicated. Why not
present it as

[lo]
RXPackets=85
RXBytes=7080
RXErrs=0
...
[eth0]
...

Btw., what sucks even more is that we have this nice header in
/proc/net/dev and we have counters for different kinds of error in the
device structure, but yet they are summed up before presentation in
/proc/net/dev. I guess the reason is backwards compatibility. (And I
wondered at one point why CIPE shows positive values on error counters
it never touches.) With a completely tagged format like the INI or the
properties format this wouldn't be an issue too.

Olaf

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Apr 15 2000 - 21:00:12 EST