Re: O_DIRECT wierd behavior..

From: Terje Eggestad (terje.eggestad@scali.no)
Date: Sun Dec 16 2001 - 08:57:53 EST


The problem is that the kernel that don't support O_DIRECT has erronous
handling of the O_DIRECT flag. Meaning they happily accept it.

In order to figure out ifthe running kernel support O_DIRECT you MUST
attempt an unaligned read/write, if it succed the kernel DON'T
support O_DIRECT.

TJ

On Sun, 16 Dec 2001, Suresh Gopalakrishnan wrote:

>
> On Sun, 16 Dec 2001, Andrew Morton wrote:
> > GOTO Masanori wrote:
> > > At Sat, 15 Dec 2001 21:59:06 -0800,
> > > Andrew Morton wrote:
> > > ...
> > > > --- linux-2.4.17-rc1/mm/filemap.c Thu Dec 13 14:07:55 2001
> > > > +++ linux-akpm/mm/filemap.c Sat Dec 15 21:52:06 2001
> > > > @@ -3038,8 +3038,11 @@ unlock:
> > > > /* For now, when the user asks for O_SYNC, we'll actually
> > > > * provide O_DSYNC. */
> > > > if (status >= 0) {
> > > > - if ((file->f_flags & O_SYNC) || IS_SYNC(inode))
> > > > + if ((file->f_flags & O_SYNC) || IS_SYNC(inode)) {
> > > > status = generic_osync_inode(inode, OSYNC_METADATA|OSYNC_DATA);
> > > > + if (status < 0)
> > > > + written = 0; /* Return the right thing */
> > > > + }
> > > > }
> > >
> > > Right. If generic_osync_inode returns error, it must be needed.
> > > This patch seems ok than my patch...
> >
> > Actually, I preferred your approach :)
> >
> > Also, note how if ->commit_write() or ->prepare_write() return an
> > error, and we have already written some data, the function returns the
> > number of bytes written and no indication that there was an error.
> > According to the write(2) manpage, that's wrong.
> >
> > Probably it is sufficient to make `written' a signed quantity and to
> > do:
> >
> > out_status:
> > - err = written ? written : status;
> > + err = status ? status : written;
> >
> > I think that fixes the five or six bugs we've found so far in this
> > function. err.. make that six or seven. What is it trying do if
> > ->prepare_write() returns a non-zero, positive value?
> >
> > It needs a big spring-clean. I'm afraid I don't have time to do that
> > for several days.
>
> Thanks for the patches. There seems to be one more fix required: the test
> program below works in 2.4.16 only if the write size is a multiple of 4K.
> (Why) are all writes expected to be page size, in addition to being page
> aligned? (It works fine on 2.4.2 for all sizes). Any quick fixes? :)
>
> --suresh
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <fcntl.h>
> #include <sys/stat.h>
> #include <sys/types.h>
> #include <unistd.h>
>
> #define O_DIRECT 040000 /* direct disk access hint */
>
> int main()
> {
> char *buf;
> int fd;
>
> buf = valloc(16384);
> fd = open("/tmp/blah", O_CREAT | O_RDWR | O_DIRECT, 0755 );
>
> printf("write returns %i\n", write(fd, buf, 4096));
>
> return 0;
> }
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

-- 
_________________________________________________________________________

Terje Eggestad terje.eggestad@scali.no Scali Scalable Linux Systems http://www.scali.com

Olaf Helsets Vei 6 tel: +47 22 62 89 61 (OFFICE) P.O.Box 70 Bogerud +47 975 31 574 (MOBILE) N-0621 Oslo fax: +47 22 62 89 51 NORWAY _________________________________________________________________________

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Dec 23 2001 - 21:00:11 EST