Re: EXT2 and BadBlock updating.....

From: Ed Carp (erc@pobox.com)
Date: Wed Apr 12 2000 - 15:08:21 EST


Theodore Y. Ts'o (tytso@MIT.EDU) writes:

> However, it's not the case that disks should in normal operation
> randomly start to lose blocks, which they then automatically remap.
> It's good that they can do this, but remember that you are potentially
> *losing* *data* when this happens. It's one thing when you have defects
> from the manufacturing process; it's quite another thing to have blocks
> go bad after the disk has been placed into service. While it does
> happen, it's very rare, and if it happened as often as you seem to think
> it does, then files would be getting corrupted all the time --- and
> that's not the case.

I *know* that I'm getting a bad block or two on the average of once a year.
Not very often, and not a lot, just enough time between that I have to look up
how to run badblocks!

> In most production houses that I've seen, if a disk starts reporting
> soft errors (which is where the block was ultimately readable, but the
> disk had to retry several times), that's generally the cue to replace
> the disk. That's because it's generally the case that the data on the
> disk is far more valuable than the cost of replacing the disk (heck the
> cost in people time of having restore from backup tapes is probably more
> than the cost of the disk), and after 2-3 years of hard service in a
> fileserver, the disk probably doesn't have much more life in it anyway.
>
> There are exceptions to this rule, of course ---- if you're in a country
> like Russia where disks are extremely expensive, then maybe the
> cost/benefit ratio changes. Or if you're a poor student, or if the
> server is in a location which is very hard to get to. However, I don't
> buy the "7x24" argument. If a service is so critical that it has to be
> up 7 days a week, 24 hours a day, it's also probably so critical that
> unscheduled downtime is far more disastrous than a planned downtime.
> Also, if there is a requirement that it be up 7x24, why aren't there
> redundant servers (never mind redundant disks in a RAID array)?

I'm slowly moving the server over to twin 36 GB drives (with software RAID),
but haven't completed it yet, partly because I've been too busy to build a
2.2.14 kernel (Red Hat 6.1 came with 2.2.12, and 2.2.12 doesn't support 36 GB
drives). When this happens, I'll re-low-level the old drives and use them for
/tmp space :)

> Anyway, we've started straying off topic. In answer to your question
> --- it may be worth doing, but it's too close to 2.4, and I have other
> higher priority projects --- and when/if it gets implemented, if you
> depend on it too much, IMO you're probably trying to do things on the
> cheap, and you WILL regret it someday. :-)

Oh, please don't misunderstand me - I wasn't trying to insist that something
be done about it *now*, or even in the forseeable future. In fact, I will
probably wind up doing it myself, partly just for fun, and partly because I've
got a client who's willing to pay for the work. Something else to do to keep
me out of the bars and off the streets ;)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Apr 15 2000 - 21:00:19 EST