Re: __commit_write() with the Page Cache

From: Jeff V. Merkey (jmerkey@timpanogas.com)
Date: Thu May 11 2000 - 19:33:01 EST


Linus,

One more item that did not get addressed. Can we change the callback to
acquire a sleeping context rather than run on an interrupt? I can
implement around it, but it's really nice to be able perform a mirrored
failover within the context of the current write of page, without
needing an extra process to post the callbacks to that callback to the
FS in a sleeping context.

NetWare implements a scheduling object called a WorkToDo, which is a lot
like a task_queue in Linux, except is uses a pool of processes in the
kernel that allows drivers to post sleeping callbacks as packets that
are run down from a sleeping context in a method that is very similiar
to a task, except that they "parasite" off other processes in the native
NetWare kernel (i.e. when you call schedule() or the kernel task
switches, it swaps in a WorkToDo process and runs down any pending
WorkToDo requests. This is how NetWare gets is speed -- from this
kernel mechanism -- if we implement it in Linux, I will be able to
approach the speed of Native NetWare with Linux.

I could implement a kernel patch to provide a WorkToDo scheduling
subsystem, and submit it -- I'd rather just explain to you how these
things work, and you would probably know better than I where to plug
this in.

Jeff

"Jeff V. Merkey" wrote:
>
> Linus Torvalds wrote:
> >
> > On Thu, 11 May 2000, Jeff V. Merkey wrote:
> > >
> > > And one more thing. I have grabbed Steve Dodd's NTFS code, and I am
> > > implementing a fully featured NTFS implementation on Linux. I've got
> > > some bad news for you here, without the ability to post variable length
> > > asynch IO requests, as are supported in Windows 2000, performance of
> > > Linux NTFS vs. Native Windows 2000 NTFS will SUCK WIND and perform
> > > poorly. Their entire AIO architecture is built on the concept of "file
> > > runs" of variable lengths for R/W IO, and most of the performance tricks
> > > in native NTFS are heavily dependent on this capability.
> >
> > Note that what you can always do is actually very simple:
> > - _always_ use a sector size of 512 bytes.
> > - allocate 8 sectors per 4kB page
> > - fill in the sector numbers appropriately: if you have a 2kB "run", then
> > you just fill in sector numbers X, X+1, X+2 and X+3 on the four buffer
> > heads in question.
> > - the ll_rw_block logic will all coalesce the writes back again, so the
> > IO should still be done in 2kB blocks. But you'd have the option of
> > just writing 512 bytes of them if you wanted to.
> >
> > Note that the 512-byte sector is basically forced upon us by hardware.
> > There's nothing we can do about that in software, so the above should be
> > able to handle all cases that hardware can do for us reasonably.
> >
> > (And yes, it _will_ use slightly more CPU time, no question about that.
> > However, most of the common operations are done on pages rather than on
> > the buffers that are used to fill in the pages or write them back, so many
> > of the higher-level IO abstractions won't even know the difference).
> >
> > You can do funky stuff like changing the sector numbers etc on the fly
> > too, without the upper layers caring. Of course, you'd better do this when
> > you own the buffer head lock, because otherwise it might be in the middle
> > of a write-out or something like that, and you sure as h*ll don't want to
> > confuse the device drivers with changing the sector number from underneath
> > them ;)
>
> Linus,
>
> Cool. I will try to have this out in early form mid-summer. I am also
> adding support for W2K partitions. I will implement it and when I've
> got some good numbers, I'll post them. For now, so folks are aware,
> they should still forward NTFS bugs on the current code to Steve Dodd,
> as the new driver is being completely re-implemented from scratch --
> folks should send them to Steve until I've got something decent up to
> hand off to him.
>
> NTFS does require 512-bytes blocks by default for Linux (very astute
> that you already are aware of this fact) since some file runs actually
> live INSIDE the MFT for small files, and are not guaranteed to be
> aligned on 1024-2048-4096 block boundries.
>
> :-)
>
> Jeff
>
> >
> > Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon May 15 2000 - 21:00:19 EST