iso9660 cdrom transparent decompression -- call for testers

Adam J. Richter (adam@yggdrasil.com)
Tue, 17 Feb 1998 19:39:17 -0800


I would like to invite anyone so inclined to test the
iso9660 transparent decompression support FTPable from
ftp://ftp.yggdrasil.com/pub/dist/cdrom/iso9660-compress-2.0.tar.gz
This is a set of patches to gzip 1.2.4, mkisofs 1.11.1 and the
linux 2.1.86 kernel. You must rebuild each of these components
from source. The mkisofs patches also add other Yggdrasil enhancements,
like fast removal of large numbers of files when generating a filesystem
(e.g., to exclude proprietary material from a CDROM snapshot), and generation
of umsdos files (e.g., for making CDROM's that can also be installed onto
a umsdos hard disk area with DOS xcopy).

This is a port and modification of the transparent compression
support written by Eric Youngdale in January 1994 for the iso9660/rockridge
filesystem. It is amazing that code that has sat stale for past four
years is so useful! I have ported Eric's code to the 2.1.86 kernel,
and made the following enhancements:

o Uses the page cache so that reading a block multiple times
does not result in it being decompressed multiple times.

o Supports arbitrary compression block sizes rather than
only 2048 byte blocks. Actually, 4096 byte blocks are
optimal, since all reads are done in 4096 byte pages anyhow.
This also increases compression performance substantially.

o Compression block sizes and count of compression blocks
have been expanded from 16 bits to 32 bits, to support
compression blocks >=64kB and, more importantly and
files larger than 64k blocks (>=256MB for a 4kB compression
block size). This required changes to the .gZ file
format and changing some file offsets referenced in
in the transparent decompression code in mkisofs.
I thought about deleting the field that contains the count
of compressed blocks, since that can be calculated, but
I have tentatively decided to leave it in, in case somebody
finds a use for varying the block size within a single file.

o The table of compressed block sizes is now a table of
the location of the end of each segment, so that it is possible
to determine the location of compressed block N by just reading
entries for block N-1 and N rather than entries 1..N.

o As a result of the previous change, the table of segment
sizes that the transparent decompression code used to allocate
has been deleted. Since your original code used kmalloc
to allocate this table and blocks of 2048 bytes, it could
not handle files that uncompressed to anything larger than
4MB. Using vmalloc to allocate this table when it was large
would have eliminated that problem, but then reading a single
byte from a 100MB file would have resulted in 100kB of kernel
memory being allocated to hold the page information. So, with
this table completely eliminated, files consisting of up to
2**32 compressed blocks are possible, and there is essentially
no memory overhead.

o When "gzip -B" generates a file, the least significant bit of
the CRC is now used as a flag to indicate whether or not this
is the last block being decompressed. This enables
decompression of a stream in this format, even if you cannot
do an lseek() on that stream. Previously, gunzip lseek'ed
to the end of the file to read the decompressed file
size and the compressed segment table.

o A workaround in zkmalloc() and zfree() for some kernel
memory allocation problem is removed because the memory
allocation problem seems to have been fixed at some point
in the past four years.

Please give this a whirl.

KNOWN BUGS:
Setting the blocksize to 1005 seems to cause a panic. 4096 (the
memory page size) is the only block size that we care about, but it
would be good to track this down. I suspect that somewhere, perhaps
even elsewhere in the kernel, some memory allocation is being assumed to
be a multiple of 4 or 8 bytes when it is not.

Adam J. Richter __ ______________ 4880 Stevens Creek Blvd, Suite 205
adam@yggdrasil.com \ / San Jose, California 95129-1034
+1 408 261-6630 | g g d r a s i l United States of America
fax +1 408 261-6631 "Free Software For The Rest Of Us."

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu