Re: procfs problems

James Mastros (
Wed, 16 Apr 1997 22:50:52 -0500 (EST)

On Wed, 16 Apr 1997, Richard B. Johnson wrote:
<Some changes to that format, shouldn't make it *much* more dificult to
parse by machine, but much easyer to parse with a brain.>

> Given this, I propose the following standardized format of a
> /proc file-system entry.
> 1.00 A line is defined as an ASCII text string that ends in
> a line-feed.
OK, good for me.
> 1.01 A line can be up to TBD bytes in length. This length
> shall be chosen so that an entire line may be read in one file
> read operation.
OK, I suguest not more than 80 characaters, to also make it not so
cumborsome to show to the user (wether or not you post-process it).

> 2.00 The first line shall consist of the names of each of
> the subsequent fields.
> 2.01 These names shall be delimited by a single space.
> 2.02 Where names would normally have spaces, these spaces
> shall be replaced with the underscore character.

> 2.03 The nature of the fields specified are such that the
> text, and the number of fields represented, can be readily
> extracted from the string first read from the file, using
> readily available 'C' runtime library functions.

OK. We should probably specify that all numbers should be in decimal,
unless there is a good reason not to. If such reason exists, the numbers
must be prefixed by 0x (for hex), and read MSB to the left.

> 2.04 Every field depicted in the header, shall have a
> subsequent line of data in the file.

> 2.05 If a file header contains a field that in not used in
> a specific platform, a blank line consisting of a single
> space, followed by the line-feed character, shall be used as a
> place-holder for that data.
See my spec for naming feilds, below.

> This space is important with some 'C' run-time libraries
> because a single line-feed without a preceding space is often
> misread.
> 2.06 Any field that becomes unused or obsolete is renamed
> with the single character '?'. Fields may be reused at a later
> date if the utilities that use these fields are correctly
> written.
> 2.07 Any utility reading the file header shall ignore any
> field consisting of a single '?'. It must also properly
> ignore, i.e., skip its subsequent data line.
> 2.08 Any utility using /proc file-system data must not rely
> on a specific field offset. Instead, it must use the field
> name. This requirement, in fact, implements 2.07 as a side
> effect.
And 2.06, for that matter.
> 2.09 Any utility using /proc file-system data shall
> properly handle any missing fields. This handling may range
> from quietly ignoring missing data, to exiting after a hard
> error message. In no case shall the utility crash, i.e., seg-
> fault if there are missing fields.

The aforementioned naming spec:

Each line shall consist of three parts
<name><arbartray whitespace><": " or ":: "><value>

name: There should be no reapeats for names within any single procfile (file
under /proc). (If you would have one, consider creating a subdirectory).
Names should be parsed case-sensitive, and may only contain printable,
non-whitespace, non-colon chararactars.

arbartray whitespace: Any sequence of the charaters " ", and " " (space and
tab). This should be ignored during parseing, and during output it is
recomended that LEN(name)+LEN(whitespace) is constant throughout the
procfile. Also, it is recomended that tabs are not used. Parsers should
note that a length of zero (no whitespace) is valid.

": " or ":: ": Delimits the name+whitespace from the value. A double colon
serves to note that the feild should be considered unstable, that is, likely
to have its content redifined or be removed entirely.

value: The value of the entry. Numbers should be written in decimal or in
MSB left hexidecimal, prefixed with an "0x". Strings may be output with
standard C style escapes, but shoudn't have any controll characters in them.
Boolean values should be expresed as yes/no, not on/off, nor true/false.

> Here is the existing entry for /proc/cpuinfo, quoted with ">" characters.
Here is my updated version.
> processor : 0
processor :: 0
Double-coloned since /proc/cpuinfo becomes /proc/cpu/#, with /proc/cpuinfo
being a symlink to /proc/cpu/0 on uniprocessor boxes only, since a field
name must uniquly identify a field within a file, so if the program wants to
know about all of the processors, reading /proc/cpuinfo is not the way to do
it, but rather stepping through the files in /proc/cpu, in which case you
allready know the CPU's ordinal.

> cpu : 586
cpu : 586
> model : Pentium 75+
model : Pentium 75+
> vendor_id : GenuineIntel
vendor_id : GenuineIntel
> stepping : 12
stepping : 12
> fdiv_bug : no
fdiv_bug : no
> hlt_bug : no
hlt_bug : no
> fpu : yes
fpu : yes
> fpu_exception : yes
fpu_exception : yes
> cpuid : yes
cpuid : yes
> wp : yes
wp : yes
> flags : fpu vme de pse tsc msr mce cx8 apic
flags :: fpu vme de pse tsc msr mce cx8 apic
Double-coloned since I would add a feild for each flag with a yes/no, to
minimize parsing.
> bogomips : 66.36
bogomips : 66.36

> My proposal would change this to:
> Ignore the fact that this editor will put in a line-feed after
> every line of text.
> processor cpu model vendor_id stepping fdiv_bug hlt_bug fpu
> fpu_exception cpuid wp flags bogomips processor cpu model
> vendor_id stepping fdiv_bug hlt_bug fpu fpu_exception cpuid wp
> flags bogomips
> 1
> 586
> Pentium 75+
> GenuineIntel
> 12
> no
> no
> yes
> yes
> yes
> yes
> fpu vme de pse tsc msr mce cx8 apic
> 66.36
> 2
> 586
> Pentium 75+
> GenuineIntel
> 12
> no
> no
> yes
> yes
> yes
> yes
> fpu vme de pse tsc msr mce cx8 apic
> 66.36
> Now, this isn't pretty. But it parses easily. Even if you do
> everything without sscanf() and friends, we readily have the
> following information after the first file read.
> o The number of data elements. (The number of characters
> with the value of 0x20 or below).
> o The names of the data elements and any to be ignored or
> skipped.

Mine is quite a bit more pretty, and only slightly less pretty. I think
though, even if you through away the rest, you should keep the
no-repeated-names rule (important for position-independent parseing, since
otherwise, there is no unique index key), and the format rules for values
("0x"ed hex, MSB first, yes/no bools).

--- James Mastros