Re: pre2.0.9 (was Re: CD-ROM Access Crashes)

Linus Torvalds (torvalds@cs.helsinki.fi)
Thu, 30 May 1996 11:25:28 +0300 (EET DST)


On Wed, 29 May 1996, Leonard N. Zubkoff wrote:
>
> I have verified that dd'ing a CD-R no longer kills my system nor does reading a
> bad block. I was able to read all the good files on my test CD, with the bad
> ones getting I/O errors. It took 1.5 hours to read it all, but there were no
> resets and the system worked fine both during and after the test. However, it
> does look like there are repeated requests for the same block, as in this
> excerpt:

If you just "dd" the raw device, you'll be using the old buffer cache for
the blocks. Or did you dd the files from a mounted CD?

Anyway, in both cases it's entirely ok to get multiple reads for bad
blocks. In fact, the page cache _always_ tries to re-read a block at
least twice - it re-tries the operation that failed before it returns an
error message.

The reason it tries to read twice is (a) it's just plain prudent to try
again in case we might have had a tempoarary problem, but more
importantly (b) there are some filesystems where it makes a difference
_who_ tries to read the file. So if the page cache isn't up-to-date, we
_have_ to re-try, because maybe the page cache error was due to somebody
else having failed to read the page.

The (b) case is obvious over NFS - if somebody has tried to read a page
from the file but the read failed due to permission checking, when the
right person comes along we can't just tell them "sorry, we have tried to
read this page already, it failed, so it's going to fail for you too". So
when we read a file and find a page that isn't up-to-date, we can't
assume that it's an error before we check a second time.

[ Stephen, I'm including you on the Cc explicitly, because I noticed that
you complained about this behaviour in NFS to Olaf Kirch. We're doing
the right thing now, and although this complicates the generic code a bit,
we really _have_ to do it this way ].

> What's definitely not implemented as yet is for a SCSI command that fails with
> a MEDIUM ERROR to be processed as a partial success and a partial failure. The
> entire command is treated as having failed.

This is bad for performance, and it can result in strange behaviour (if
the IO request contained requests from two different file reads that were
merged at the IO level they both fail even though the error was
potentially in just one of the file). However, if there are IO errors you
shouldn't really consider your filesystem reliable anyway, so I don't
think this is critical (and the re-try might actually sort this case out
correctly too).

> In addition, it will still signal
> an I/O error when a bad sector is encountered, even if we're really at the
> logical end of the CD-R.

Not really a problem, except if the read-ahead code then results in part
of the _good_ sectors also being marked bad (due to the previous code).
We should probably disable read-ahead in the old buffer cache in this
case (or just fix the nontrivial problem with partial failures).

Linus