Re: What /proc should contain [was: /proc/driver/microcode]

From: Ricky Beam (jfbeam@bluetopia.net)
Date: Fri Feb 25 2000 - 01:53:52 EST

Next message: Dominik Kubla: "Re: Linux console driver maintainer?"
Previous message: Aaron Denney: "Obsolete variables in Makefiles?"
In reply to: Peter T. Breuer: "Re: What /proc should contain [was: /proc/driver/microcode]"
Next in thread: Theodore Y. Ts'o: "Re: What /proc should contain [was: /proc/driver/microcode]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, 25 Feb 2000, Peter T. Breuer wrote:
>People are processes not spawned by the kernel :-). Anyway, amusing

>More dubiously, you seem also to be saying that you want to
>
> make proc into a source of binary info only, so as to remove
> inefficiencies connected with writing and reading back the same info.

Nope. Well, not entirely. That is a serious enough problem, but easily
fixed without much dispute. My sticking point is in making the kernel do what
should be done in user space. Text->Binary->process->Text is just a really
bad thing in my book.

>Sorry. I disagree. You base your premise on "only processes read proc".
>Well, tough, if you were worried about that you wouldn't ever let info
>out of the kernel into user space in the first place. Why, it's just
>copying info from one buffer to another and back again!

There is a BIG difference between the kernel returning a u32 to userspace
as the four bytes that make up the u32 (which can be done in ~1 insn) and
converting that u32 to a text format ("%08x") (which takes many more than
~1 insn.) Add to that the processing the userspace application is going to
perform to turn that text back into a u32 to perform real work on it.

Multiply all that by the number of times that happens over the course of
a day and it'll add up to a whole lot of cpu time that could have been
better spent doing "real work."

>You are perhaps not aware of how important a source of information proc
>is to me personally!

Gee, millions of computers all over the world waste, say, 35% of their
processing capability constantly translating from text to binary and vice
versa because YOU want to read procfs files directly. How rude of me to
suggest otherwise. <smirk>

Let me point out the loop-hole here... YOU are already running a command
to process the procfs data -- 'cat' or something similar. The fact that
you use the same command to read any file is only slightly important. The
fact that the command is merely moving procfs data from one fd to another
is not important -- altho' the obvious optimization would be to create a
path for instructing the kernel (ala procfs) to dump the procfs data directly
to the destination fd, but that's an NT-ism :-) The fact that the kernel
is performing the textual generation and formating IS important.

By placing the human interfacing in the kernel, you are making the assumption
that humans will be the _only_ thing interested in the output. If this were
true, then a large number of applications could be removed from distributions
as redundant and replaced by 'cat'. If there is a single instance where a
non-human application acts on, or otherwise processes the information made
available from the kernel via procfs other than simple reformating (moving
whitespace, translocating columns, etc.) then this assumption is no longer
valid.

>I[n] contrast, I have [n]ever looked at the process info. You can throw that
>away as far as I care. It's only use to me is that killall traverses it
>while searching for the target I told it to look for.

Then you would present no obstacles to a procfs for process data and a
kernfs for kernel information?

>> >the information has to be somewhere. How does being in proc in itself
>> >somehow make the information less secure than in a file somewhere else?
>>
>> And if this untrusted thing has a buffer overflow or other flaw making it
>
>You get a kernel oops. Anyway, there aren't any buffers in proc. That
>was a valid complaint of yours. That's why it's "inefficient". Data
>is recalculated at every access. How is this different from a buffer
>overflow in sysctl, anyway?

I'm sorry for the confusion... "thing" is in user space. A buffer overflow
in user space can (read: always ends up) resulting in something or someone
attaining root privs. At which point, any file system permissions that
may exist no longer have any bearing. This is a moot point anyway as you'd
have access to the same information via procfs or sysctl() -- plus, with
root access, you can mount procfs where you can see it :-(

(Yes, you can mount '/proc' more than once.)

>> i.e. a procfs_chroot()? Linus wants all the sysctl() stuff to be a file
>> system tree. That's a reasonable thing to do, but I don't want it attached
>> to a static point in the file system making it difficult to cage processes.
>
>Mmmm .. don't get it. Explain it to me later.

I'll try...

Let's say you are going to be running a program that cannot be trusted with
access to the entire system. (Maybe I should say _won't_ be trusted.) SO,
it's started from a chroot()ed environment. This environment will obviously
NOT have any part of procfs anywhere near it. When this program starts,
it tries to find out how many processors there are in this machine to know
how many threads to create. Under Linux, that's not possible without access
to procfs. The program gives up and assumes at least one processor. It then
tries to find an open port in a specific range. It cannot use any type of
"netstat" logic as that information is in procfs (net/tcp) so the apps only
choice is to start trying each port in sequence.

Short of mounting procfs inside the program's environment, there are alot
of things it cannot figure out without a lot of work (i.e. bruteforce
detection.) That includes some of the sysconf() information.

Is that good enough?

>That's fair and appropriåte use. Slowly changing status info. EIther
>count once at startup and print on request, or check at every access :-).

Umm, there's no need to do any of that... the kernel knows EXACTLY how many
processors there are -- that number doesn't change in a running system.
There's just no way to get at that number without counting cpuinfo stuff
or rooting through /dev/kmem. (I'm going to continue to bitch about this
until it's fixed.)

>> There are far too many things pushed into /proc out of convenience without
>> regard to fast, tiny, efficient non-fs access to them. Also, people
>
>An inefficient /proc is about as harmful as bad cornering ability on a
>panzer tank. The point is that it does what it is supposed to do, and
>does it well. Speeding up the panzer's cornering by taking off the
>armor cladding wouldn't help! If you want a binary interface, put
>in a char device, or some such thing.

Actually, panzer tanks corner pretty well from what I've seen :-) But that's
not important...

The point is, if I want to be able to run 'ps', then I get all this other
junk riding along with it. I can cut some of it out by disabling sysctl,
but that doesn't do much. Additionally, if I want to be able to administer
syncookies or the like, I need procfs enabled...

I'm not saying do away with procfs -- that's just as bad as altering the
format of meminfo, for example. I simply want a means to access and control
things via a programatic, non-human, binary interface. There needs to be
ways of accessing things other than via text in /proc. You can leave the
file system accessed textual crud if you like (and YOU do) but kindly
split that off of procfs so being able to see the process tree doesn't
drag along with it a bunch of other stuff you might not want.

Actually, spliting procfs into two parts isn't such a bad idea. It would
be far less mess than converting everything (or anything for that matter)
to a binary format instead of the text format(s). (I'll revisit this later.)

>> quickly throw together a bunch of sprintf()s to hand back some status
>> or debuging information without spewing stuff via printk(). As a debug
>
>That's because you can control when you get info from proc.

You can control when you get information via printk too, but who cares?

>> As I said originally, I'd have to kill too many people to reverse this
>> evolution, and it would do little to it from happening again. All I can
>> hope to do is stop people from making it the sole interface for information.
>> And to stop them from dumping everything in sight as english text.
>
>Why not? You beef is that processes read the text instead of doing some
>weird ioctls and sysctls, not that the text is there.

It doesn't have to be "weird ioctls and sysctls". You can read binary data
from procfs just like you can from anywhere else. The sysctl() stuff is
merely a suggestion to avoid forcing programs to parse text -- they can
use a sysctl (or read a binary file) to get something in a native format.

>Well, lex and yacc have always been somewhat obscurantist tools (but

That may be, but I see way too much of it. I don't know which is worse,
lex code (read: wu-ftpd) or sendmail rulesets.

>as the other, and get back the spec'd attribute. A sort of super
>printk. The question is what you do on error :-).

Actually, that'd be scank() :-) And it would of course panic() and/or oops
on error <grin>

>But that data came from the kernel, went to netstat, then went back
>into the kernel via printf and the tty just to be put on your screen.

There's a big difference between the printf() path to the screen and the
read() from procfs to the printf(). The printf() output is pure byte
copying -- there's no translations to anything from anything. The read()
from procfs returns text which netstat then converts to binary before
printf()ing them to the screen. One could optimize some of this but
parts of the information from procfs has to be converted to binary before
it can be processed -- ip address information for example.

>It already went back and forth a few times, never mind the
>str->bin->str bit in netstat itself. The bit you're complaining about is
>(most probably) linear complexity. It often happens whenever any two
>processes commuicate, whether one of them is human or not. They
>traslate between the interface format and their internal format.

My problem is with the str->bin->str. Have you ever used babelfish? I
rest my case.

>I disagree. If I had to use a tool to read the files in proc, almost
>all its value would disappear for me. It would cease to be a convenience.

You're already using a tool... it just so happens to be the same tool
in every case.

>The zillions of programs you want to interpose are unnecessary. By

Not at all. These programs already exist. When people want to see the
route table, do they 'cat /proc/net/rt_cache'? No, they use 'route' or
'netstat -r'. When people want to see the list of network connections,
do they 'cat /proc/net/tcp'? No, they use 'netstat -t'. When people want
to see memory utilization, do they 'cat /proc/meminfo'? No, they run 'free'.
When people want to see the mounted file systems, do they 'cat /proc/mounts'?
No, the simply run 'mount'.

The information YOU so love to access via 'cat' the majority of people access
via the programs designed for the respective information. In fact, alot of
them are using system management tools (linuxconf, etc.) and don't even
know about the procfs files. We are not the average user -- I would venture
to say no one on l-k is.

>all means add some to pretty up the output, or repretty it. But you
>didn't really need any in the first place. Most proc output is pretty
>readable by those humans interested in it.

To you, me, and the rest of the kernel toadies, maybe. But to the rest of
the planet (you know, the people who cannot parse text or use a multi-button
mouse?), it's giberish.

>> >Why should I care if it takes 10ms or 20ms to cat a file from proc?
>>
>> (dammit) That is the most damnable statement anyone in the computer science
>> world can say. It's attitudes like this that will require a 500GHz processor
>> to run "hello world" in a few years.
>
>Sorry, this argument has always been wrong. Don't worry about the speed
>first. Chip density doubles every 18 months. That's 30% shorter

No, you worry about "efficient" first, then making it go fast. "Do it
RIGHT, then do it FAST."

>traces every 18 months, plus other speedups. The parsers I wrote in
>1990 were too fast then (8MHz). Nowadays, the same parse code will run
>approximately 100 times faster. And parsing is trivial in constrast to the
>cost of lexing (those silly huge state machines). And that in turn is
>trivial compared to the cost of actually doing something. You are
>arguing about next to nothing, which is not to say that the argument
>is without merit!

This is merely a self-reinforcing paradox. People write less and less
efficient (optimal) code because they know processors keep getting faster.
Likewise, processors keep getting faster because the code keeps getting
worse. There's no motivation to generate good code anymore.

>(to invent a bit of notation for the construct). Anyway, the proper
>cure is to beat the hell out of the guy who changed it and send a patch
>in to Linus to change it back.

Umm, how is that any different from a binary data structure? NONE. If you
change the format, the previous parser breaks. Experience has shown binary
structures are far less prone to tinkering.

>> You'd be surprised. Granted, most of them don't have a human on a keyboard
>> standing directly over them -- there's a master control station where the
>> person sets with the keyboard to montitor >1 systems. And yes, I do know
>> if my embedded systems are doing their job(s)... and I do it without spewing
>
>I doubt it. If you could predict the failure modes you wouldn't have to
>have humans watch out for them!

You want a nuclear power facility unmaned? You really are nuts.

>You said efficiently, and you've been using that word all along in a way
>that I took to mean speed. Are you saying that you object to a computer

Unfortunately, efficient doesn't always mean fast. I know, the buz words
are hard to qualify with modern processors. (what's fast on one isn't
necessarily fast on another, etc.)

>doing something TWICE, rather tha SLOW? Bear in mind that doing it

On principle, yes, doing something slowly once is prefered to doing it
twice fast. (see below)

>twice may be the fastest/best way to go. We just had that discussion
>over boolean or. Apparently it pays to get both branches executed
>speculatively.

Yes, I know this too. Modern processors have a great deal of magic in
them making "efficient" a pencil and paper principle. (Damn that real
world!)

>The rest of the world doesn't know how to click a mouse, relatively
>speaking.

I know a few of those too <sigh>

>Not necessarily. Some parsing techniques are both fast and forgiving.
>Yes, backtracks generally cost, the same way that mistaken branches
>cost in microcode. But the architecture of modern computers may make
>that irrelevant too. I am refereeing more and more papers on quantum
>computing these days, and the semantics are _not_ sequential.

Ah, quantum computing... efficient and inefficient _at the same time_.

>> it's input. I've written a number of data file analysis tools. Their parsing
>> of text is _extremely_ fast but completely intolerant of changes to the
>> input.
(of course it's speed also defies logic -- it pipelines very well it would
appear.)

>I can imagine. But they're also extremely fast to change when the
>format changes!

Yeh. I'm the only one using them. Take the change to meminfo some years
ago. It was trivial to tool 'free' to deal with it, how you'd need to get
several million people to replace their 'free'. Once a format is entrenched,
it's very difficult to change. (the format for /proc/net/dev changed awhile
back too, and it confused the hell out of people -- didn't crash anything
though.)

>> - ifconfig by itself will not work -- it cannot find any devices to lookup.
>
>Mine did. Sorry, but I tried it. ifconfig eth0 gave me the data.

Ok, the counters are not displayed... and it fails to list aliases (at least
on my machine)

>> - some X servers will fail as they look in /proc/pci (or /proc/bus/pci)
>
>When? At startx? I was running X when I umounted proc. and I didn't
>die. I had to kill a couple of tasks in proc first ... I don't recall
>which.

SOME X servers will fail to start as they look in /proc to detect the PCI
video card(s). My X server doesn't care -- it's freakin' ancient tho'.

>> I didn't say the machine would crash :-)
>
>Well, you pretty nearly did.

<grin> the kernel will not care nor should init. So, the machine should
continue running no matter how many userland apps fail to perform as
expected. (Why do I suddenly feel like I'm hunting for y2k bugs again?)

>> data, but there are problems with that. (The entire notion of file sizes
>> in /proc is a grey area anyway.)
>
>These files are really streams of data. They might as well be pipes
>rather than files, except that there is no practical difference ...

Aside from the fact that one can only seek() forward on a pipe.

>> I would say "reader_s_"... there will only be one fd attached to this
>> reading instance. Ok, maybe there'd be dup()s of it? How do duplicated
>> fd's get handled?
>
>Good question, Moriarty. Magically. Somewhere. There has to be a
>reference count. Oh, anyway, the buffer we write into won't go
>away until its readers have left it, since it was handed down by the
>VFS.

Ok. I'll have to go look at the descriptor handling... if dup() goes
down the open() path, then no problem. Otherwise, procfs might need to
check a use count before freeing the buffer. (or we can just leak memory :-))

--Ricky

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Dominik Kubla: "Re: Linux console driver maintainer?"
Previous message: Aaron Denney: "Obsolete variables in Makefiles?"
In reply to: Peter T. Breuer: "Re: What /proc should contain [was: /proc/driver/microcode]"
Next in thread: Theodore Y. Ts'o: "Re: What /proc should contain [was: /proc/driver/microcode]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Tue Feb 29 2000 - 21:00:11 EST