Re: files > 2GB

From: Riley Williams (rhw@MemAlpha.CX)
Date: Tue Jan 25 2000 - 07:11:40 EST


Hi Jakub.

>>> And LOTS of testing :-) Of course just this. Reiser proposed to
>>> support files over 2GiB in 2.3.x with upcoming version of
>>> ReiserFS for 2.3.x ... When (and if) it'll be ready to actual
>>> use is not clear though. Of course you'll need updated version
>>> of glibc 2.1.x as well

>> All very true...

>>> (BTW what should happen with "old" system calls when they are
>>> used for big files?)...

>> My suggestion would be to use the following rule set:

> What are those rules for?

They specify how the KERNEL should respond when it finds itself
running a syscall that needs to return a 32-bit off_t but has a value
too big to fit in one.

> There is a LFS standard (e.g. part of Unix98), so you just
> compile your programs with -D_FILE_OFFSET_BITS=64 if you want
> transparent 64bit access (off_t is a 64bit type in that case,
> open, lseek and the like are redirected to open64, lseek64 etc.)
> or you can compile with -D_LARGEFILE64_SOURCE so that you can
> choose between using off_t/open/lseek/stat/etc. and
> off64_t/open64/lseek64/stat64/etc.

Providing that's been done for ALL of the utilities on the system,
this discussion is irrelevant as the proposed scenario could not
arise. However, if there's just ONE utility that hasn't been compiled
with that flag (including those supplied in binary-only form) then
there is the possibility of it happenning.

> Several utilities in the Linux distributions already use
> -D_FILE_OFFSET_BITS=64 (like fileutils, etc.), so if you just
> upgrade glibc and kernel, ls and the like will work.

> This is present in glibc for some time already, with the LFS
> interface in the kernel those 64bit functions only call the LFS
> syscalls and not the old ones. After a few missing things are
> finished in the kernel and in glibc, we'll have a full LFS
> implementation (missing is e.g. 64bit file locking, 64bit file
> size resource limits and a few other things). But you can
> already play with 64bit files on 32bit hosts and it is
> definitely not limited to ReiserFS as ext2 works just fine.

That's fine for applications and utilities, but note that this is the
KERNEL development list. You may also like to remind yourself of the
question I responded to, which I will paraphrase to clarify it:

 Q. What should the kernel do when, in the middle of executing a
    function that returns a 32-bit off_t value, it finds itself
    referring to a file whose length is too large to be stored in
    such a value?

My reply can be summarised as follows:

 A. If the value to be returned fits into the 32-bit off_t value,
    return that, otherwise return -ETOOBIG and expect the program
    to deal with it if it actually cares about the value returned.

I believe such is reasonable, taking into account the fact that many
programs ignore these return values anyway. As an example, if I've
just read a record from a file of fixed length records and wish to
overwrite it with an updated copy thereof, I would see nothing wrong
in using something along the lines of the following code:

        if ((fd = open("File-of-records.dat",O_RDWR)) == -1)
            perror("prog");
        while (read(fd,record,sizeof(record)) == sizeof(record)) {
            /* Process record */
            if (record_updated) {
                (void) lseek(fd,-sizeof(record),SEEK_CUR);
                write(fd,record,sizeof(record));
            }
        }
        close(fd);

Assuming this program has been compiled with settings that result in
it using the 32-bit version of lseek, why should it fail if the file
being processed is 10G in length? It cares not what the return value
from the lseek is - indeed, it doesn't even bother to store it - and
only requires that the file pointer be moved back to the beginning of
the record that has been updated.

The only reason this might fail would be if the record itself was over
2G in size, and I have never yet designed a record that large.

Best wishes from Riley.

 * Copyright (C) 1999, Memory Alpha Systems.
 * All rights and wrongs reserved.

+----------------------------------------------------------------------+
| There is something frustrating about the quality and speed of Linux |
| development, ie., the quality is too high and the speed is too high, |
| in other words, I can implement this XXXX feature, but I bet someone |
| else has already done so and is just about to release their patch. |
+----------------------------------------------------------------------+
 * http://www.memalpha.cx/Linux/Kernel/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Jan 31 2000 - 21:00:14 EST