Re: ext2 warning in Linux 2.2.7

Alexander Viro (viro@math.psu.edu)
Wed, 5 May 1999 16:31:48 -0400 (EDT)


On Wed, 5 May 1999, Steve Dodd wrote:

> > Now I did ;-) Why on the earth do you use *positive* values for
> > errors? I didn't finish wading through the directory-scanning logics
> > yet...
>
> First, the standard disclaimer: I'm not the author of the NTFS driver, and
> I don't consider myself a kernel hacker by any stretch of the imagination. I
> got interested in it when it started blowing up around 2.2.0/1. That said, I'd
> love to become au fait with the kernel, and fs-related bits seem to be one of
> the more interesting places to start...

Welcome aboard ;-)

> Error values: Pass. Is there a good reason not to? The values get inverted
> before being returned to the VFS layer (except for the odd mistakes which you
> caught the other day). I thought -ve error returns were a hack for situations
> were the function normally returned a value, and one had to avoid the global
> errno variable that's used in, e.g. libc.

Just a general (and very old) convention...

> > It may mean nastiness with locking... (disclaimer - I didn't check
> > the actual code yet).
>
> Thinking about this, I'm worried about the re-entrancy and SMP-safeness
> of the NTFS stuff at the moment. The NTfs driver holds the MFT inode open
> all the time, so that may not be such an issue there. I'm still worried about
> the SMP-safeness though. I'll have to look and see what locking happens at
> the VFS layer.

Almost every bloody method is protected by big lock. dcache is profoundly
non-SMP-safe. So normally all races that can happen happen on UP too
(either that or they screw a lot of of other places - see recent patch
adding the missed lock_kernel() calls).

> One of the things that makes the code a bit confusing is that it is designed
> to be easily portable. There are the common files (attr.c, dir.c, inode.c,
> super.c, util.c), and then platform specific ones. At the moment the supported
> platforms are linux21, linux20, 44bsd & user (the user-space utils).
>
> At the moment I don't think the code interacts with the icache well; whenever
> a change is made to an inode, it's written out before leaving the driver. I'm
> planning to make it use mark_inode_dirty(), iget() and iput() properly soon.
> I need the iget() and iput() stuff to avoid a nasty (but obvious) race with
> updating directory indexes when file sizes change: NTFS keeps a copy of the
> file size in the directory entries, which is a pain.

Awww...

> Also, NTFS - and the WinNT driver - support logging / journalling. Ideally
> we should implement that but I don't think anyone has really considered it
> yet, mainly because the format of the Logfile on disk isn't clear and is
> certainly not documented (don't you just _love_ M$?). I understand you're
> working on a journalled filesystem (or is it a journalling layer for
> existing filesystems?); I'd be interested in seeing the code.

It's a different layer and besides the wench is dead ;-) I'm not
dealing with journalling - SCT does. What I'm interested in is softupdates
and it involves modification of buffer-cache code rather than filesystems.
I don't know if s-u can be reasonably applied to NTFS, though.

> The 'light-weight inode' system discussed a while back for 2.3 sounds
> interesting, too..

NTFS is mostly clean in that respect. Try to undef NTFS_IN_LINUX_KERNEL
- I suspect that it will work as is.
There is exactly one problem with light-weight inodes - current
lossage with FIFOs. ->u.pipe_i overlaps with ->u.generic_ip (and every
other field of union, indeed). I've submitted a patch to Linus, but it was
in 2.2.0-pre* and that was *not* the time for such stuff. Will be
resubmitted as soon as Linus will open 2.3. Another part is completely
trivial - define macros a-la EXT2_I(foo) &((foo)->u.ext2_i) and do trivial
replacement. As soon as we will have FIFO situation cleaned up we will be
able to change the macro to ((foo)->u.generic_ip) and put allocation of
the fs-specific part into <fs>_read_inode(), <fs>_new_inode() (for those
who use get_empty_inode()/insert_inode_hash() there) + deallocation into
<fs>_clear_inode().

> sounds like a good idea, though. Would you just implement it as another
> super_ or inode_operation, with a default implementation in VFS for sane,
> UNIX-like filesystems?
Yes.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/