On Sun, 2009-01-04 at 17:20 +0200, Boaz Harrosh wrote:James Bottomley wrote:On Wed, 2008-12-31 at 17:19 +0200, Boaz Harrosh wrote:Andrew Morton wrote:On Tue, 16 Dec 2008 17:33:48 +0200
Boaz Harrosh <bharrosh@xxxxxxxxxxx> wrote:
Doing mkfs in-kernel is unusual. I don't think the above description
sufficiently helps the uninitiated understand why mkfs cannot be done
in userspace as usual. Please flesh it out a bit.
There are a few main reasons.This is really a reflection of the whole problem with the OSD paradigm.
- There is no user-mode API for initiating OSD commands. Such a subsystem
would be hundredfold bigger then the mkfs code submitted. I think it would be
hard and stupid to maintain a complex user-mode API just for creating
a couple of objects and writing a couple of on disk structures.
Certainly not a problem of the OSD paradigm, just maybe a problem
of the current code boundaries laid out by years of block-devices.
Not having a suggestion for redrawing the boundaries is a problem of the
paradigm. Right at the moment using OSD is an all or nothing, there's
no migration path for block based filesystems, or even a good idea how
they'd take advantage of OSD. Most OSD based filesystems are for
special purpose things (mainly cluster FS).
In theory, a filesystem on OSD is a thin layer of metadata mapping- objects to files. Get this right and the storage will manage things,
objects to files. Get this right and the storage will manage things,
+ files to objects. Get this right and the storage will manage things,
[objects to files is what some of the osd-targets do.]
like security and access and attributes (there's even a natural mapping
to the VFS concept of extended attributes). Plus, the storage has
enough information to manage persistence, backups and replication.
The real problem is that no-one has actually managed to come up with aI'm not sure what you mean.
useful VFS<->OSD mapping layer (even by extending or altering the VFS).
Every filesystem that currently uses OSD has a separate direct OSD
speaking interface (i.e. it slices out the block layer to do this and
talks directly to the storage).
Lets take VFS<->BLOCKS mapping for example. Each FS has it's own
interpretation of what that means, brtfs is less perfect then xfs
or vice versa?
I guess you did not mean "mapping" but meant "Interface" or API.
(or more likely I misunderstand the meaning of "mapping" ;)
No ... by mapping I mean mapping of VFS functions.
For example, an OSD filesystem should be user mountable: if the user has
the security key (could possibly do this in userspace). Additionally,
an OSD with attributes should be pluggable into the VFS layer
sufficiently to allow attribute search, even if the VFS has no idea of
the metadata layout, we can still get objects back. We'd also better be
able to do backup and restore of object based devices.
The basic problem for OSD, at least as I see it is that unless it can
provide some compelling relevance to current filesystem problems (like
attribute search is 10x faster over OSD vs block or X filesystem gets a
2x performance improvement using OSD vs block ...) it's doomed forever
to be a niche player: nice idea but no relevance to the real world.
Well that is exactly what I was attempting to submit. A general-purpose
low-level but easy-to-use, objects API for kernel clients. be it a
dead-simple exofs, or a complex multi-head beast like a pNFS-Objects
file system. The same library/API/Interface will be used for NFS-Clients
NFSD-Servers, reconstruction, security what ever.
OK ... perhaps I missed the description of how a general purpose
filesystem might use this then?
The block-layer is not sliced out, Only the elevator function is, since
BIO merging, if any, are not device global but per-object/file, and the
elevator does not currently support that. (Profiling shows that it will
be needed)
Um, your submission path is character. You pick up block again because
SCSI uses it for queues, but it's not really part of your paradigm.
BTW. The block-based filesystems are just a big minority in Kernel. The
majority does not use block-layer either.
I suppose this could be taken to show that such a layer is impossibly- would be very little layered code sharing between ODS based filesystems.
complex, as you assert, but its lack is reflected in strange looking
design decisions like in-kernel mkfs. It would also mean that there
would be very little layered code sharing between ODS based filesystems.
+ would be very little layered code sharing between OSD based filesystems.
I disagree.
All the OSD-Based file systems (In Linux) should absolutely only use the
open-osd library submitted. I myself will work on a couple. If anything is
missing that could not be added later, I would like to know about it.
But that's precisely the problem: "OSD based filesystems" implying that
if you want to use OSD you write a new filesystem.
It's an indicator of one. If you buy my premise that OSD cannot be
relevant without compelling user cases, then the lack of a user API can
be viewed as a symptom of this.
I think your choice of using a character device will turn out to be a
design mistake because the migration path of existing filesystems is
bound to be a block device with extra features (which they may or may
not make use of) but only if there's a way to make ODS relevant to
users.