Tagged files in /proc (Was: Re: (Fwd) Re: /proc/apm and power status)

Markus Gutschke (gutschk@uni-muenster.de)
19 Feb 1996 14:04:04 GMT


-----BEGIN PGP SIGNED MESSAGE-----

As I believe it to be import that this topic is discussed, before
1.4.x/2.0.x is released, a copy of this message is sent both to the
kernel mailing list and to Linus.

In article <852334F0151@rkdvmks1.ngate.uni-regensburg.de> "Ulrich Windl" <Ulrich.Windl@rz.uni-regensburg.de> writes:
> I think the only good format for things that might be extended or
> modified over time is a "tagged" format. The BOOTP extension format
> is tagged, but binary. They have length bytes, so they don't need
> separator bytes.
> The advantage of a tagged structure is that you can drop entries (and
> save space) and you can add entries (new tags) without breaking
> applications (that are well written).
>
> Maybe the "true (TM)" apm format should be something like
> "00: value\n01: value\n02: value\n..." where "01" is a tag, "value"
> is some string not containing a newline, and "\n" is a new line.
>
> More perfect would be using "basic encoding rules" from ISO/OSI. Then
> you have a structured binary stream of octets (bytes). But that would
> not allow any text utilities to be used. So I think we should live
> with US-ASCII for /proc for a while.

I am not decided, whether it would be a good idea to change the file
format of such things as /proc/meminfo once again. But if we decide to
do so, we should do it NOW, before 1.4.x (or 2.0.x) comes out! There
is just no use requiring "normal" users to update all their utility
programs whenever a new "production" kernel becomes available. These
incompatible changes are acceptable for "development" kernels, but
should otherwise be kept to a minimum.

If we do change the format of these files, we should do so in a way,
that is easily extensible in the future. Thus a tagged format would be
advisable. Furthermore, we should aim for keeping it US-ASCII, in
order to ease debugging, when the system goes hay-wire.

The format that Ulrich proposed is a step into the right direction,
but I oppose to using decimal tags. This will lead to
(in)compatibility problems when several people start extending the
format of a file (remember the confusion with duplicate
syscall-numbers, once the kswap patch was available?)

Thus I propose the following format:

- All entries are single lines in US-ASCII (7bit) format, that are
terminated by a single '\n' character. There is an extension for
using binary entries, but its use is strongly discouraged.

- The first entry of each line is the tag. It extends until the first
colon or the first tab character.

- The tag can optionally be followed by further fields. These fields are
separated by tab characters. After the last of these fields, there
is a colon. These fields can be used for further extensions. The
first optional field (if present) is a comment that displays
explanatory information for the human reader; it will not usually
be evaluated by any program. The last optional field is used for
marking binary data (see below). Two consecutive tab characters
denote an empty field.

- The value of each entry is the part of the line that extends from
the first colon to the terminating '\n'. (unless the following line
is an extension line, or if binary data is encoded; see below)

- It is possible to extend entries over several lines (this is only
possible for the "value" part of the line!). This is marked by
preceding the next line with a '+' character (and optional
white-space *before* the '+').

- All entries can be surrounded by an arbitrary amount of white
space. This white space is trimmed before evaluating the
entries. This feature is needed, so that using 'cat' on one of the
files in /proc will yield pleasing output.

- If binary data is to be contained in the file, it is to be marked
as such. This is to be done by inserting the length in bytes as an
parameter. This information is to be represented as a hexadecimal
(all uppercase 0..9A..F) number in brackets. It has to be the last
of the optional field immediately before the colon (no white space
permitted). The colon is followed by the binary data; there still
is a terminating '\n' character appended (which is not counted!)
Extension lines do not make sense for binary data.

- All illegal entries are considered comments. Thus, it is only possible
to include full-line comments. Common ways for inserting comments
are: 1) lines that do not include a colon, 2) lines with leading
colon, 3) empty lines.

This is a sample of valid entries:

tag1: value
yet-another-tag This is a sample entry :3.141592
+65359
: this is a comment ::
so is this
binary-entry [3]:ABC
tag Comment optional1 optional2: my value

extension: hello
+world
: in the above example the value is "hello world"; leading
: and trailing spaces have been trimmed, but the spaces after
: "hello" are preserved, because of the extension line!

The advantage of this file format is, that it is already compatible
with some of the existing files in /proc, while still being suffiently
extensible in the future. I would suggest, that if we want to make the
transition to the new format, we should do so for *all* files in
/proc. This will mean that latter incompatible changes will not be
neccessary.

I invite comments to this proposal. I would prefer, if the respective
authors of the device drivers and of the tools that use the /proc file-
system would write the neccessary changes, but if need be, I would
volunteer to help.

Markus

-----BEGIN PGP SIGNATURE-----
Version: 2.6.2
Comment: Processed by Mailcrypt 3.3, an Emacs/PGP interface

iQCVAgUBMSiCxhqJqDLErwMxAQHfNwP/cpc2ZG8zt1Br6fLfmTBoX9GN4Mi1d663
QLzFX91GenaG8jHYPlpqXtCOzKf4aBSMzn6lMIZeb85dZBhxTuXTrRDLpoy0yX3+
B2HDwpX/S1FOzZ8AIHfNR4M4QrGNmElWXvX9GvxOqSPclnd8fDLX59S56EHG9kxk
w5CHZATAwhk=
=QTZz
-----END PGP SIGNATURE-----