On Thu, Oct 30, 2003 at 10:23:49PM +0300, Hans Reiser wrote:Special cases of general theorems are not more powerful than the general theorems, they are simply special cases. You can design a language that has the power of both relational algebra and boolean algebra.
Your assumption here is that the only thing that people search andNo, my assumption is that structured data is a special case of semi-structured data, and should be modeled that way.
index on is semi-structed data.
There are much more powerful ways of handling structured data (as
opposed to generalized text searches).
What WinFS is specificallyWhen you say formatted text, do you mean fonts and stuff, or do you mean object storage models. Object storage models should generally be replaced with files and directories.
addressing is searching and selected based on structured data.
In addition, even for text-based files, in the future, files will veryFormatting does not make text table structured.
likely not be straight ASCII, but some kind of rich text based format
with formatting, unicode, etc.
No, but it means that doing searches on formatted text is very
difficult,
and should be done in userspace, not kernel space.No, I hate SQL. I want to allow people to use Reiser6 queries to find things.;-)
You are missing my argument. I am saying that the indexes and name space belong in the kernel, not that the auto-indexer belongs in the kernel.
Searching and name spaces are different things. Fundamentally I
disagree with your belief that they are the same thing (and yes I've
read your whitepaper on the namesys web page). You can do much, much
more powerful select statements than makes sense to do via the
directory abstraction. (Think about arbitrary select statements,
possibly with subselect statements. That's what Microsoft is
promising in WinFS. Do you really want to support an opendir system
call where its argument is an arbitrary SQL select statement?
IYou mean, it is what most people consider a primary key. Or at least I hope you mean that, because the whole point of all those articles (in what, the 80's was it? ) that strove to coin the name "namespace" was that filesystems and databases and search engines and so on are all namespaces. and they strove to imply that unifying them was possible and desirable.
didn't think so.)
There is a very, very big difference between a pathname, which is
guaranteed to be refer to a single unique file, such as might be used
in a Makefile. This is what most people consider a real namespace.
When addressing people, a passport number, or a driver's licenseOh god, did you read the literature?
number, or a social security number, are all examples of a namespace.
Each one of these is guaranteed to return either no result, or a
single specific person.
In contrast, consider searching for someone who is male, between 30
and 40, is named Tom, and lived in Libertyville, Illinois sometime
between 1960 and 1970, and is married to someone named Mary who was
born in California. This might return several people, and most people
would **NOT** consider the space of all queries about people to be a
"name space".
Searches are not names. They do not uniquely identifyYou mean like Theodore? Are you saying that Theodore is not a name because it does not uniquely identify you?
people or objects, which is a fundamental requirement of a name.
We can create a filesystem with a directory indexed by social securityI bet it will be less code than balanced trees were.
number, and another directory with hard links that indexes people's
records by driver's ID. That makes sense. But putting in sufficient
indexes so that the above query of looking for somone named Tom who is
married to someone named Mary (and this is an example where an query
optimizer would be needed) is simple, pure insanity.
Actually the relevant measure is, not how often do you use it, but how often would it context switch if it was not in the kernel. Users rarely use the networking code directly.
uh, all the time, if there is a namespace that lets him. How often do you use google? How often do you memorize the primary key of an object in a relational database, and use only that versus how often do you do a richer query?
I use google dozens of times a day. I type commands to bash hundreds
of times a day. Does that mean that bash command line parsing should
be in the kernel? Of course not!
The bottom line is that for something that happens dozens or even
hundreds of times a day, that's an argument that it *shouldn't* be
done in the kernel. Compare and contrast that with handling incoming
network packets, which can happen millions of times per hour.