Re: linux-kernel-digest V1 #123

Rob Janssen reading Linux mailinglist (linux@pe1chl.ampr.org)
Sat, 22 Jul 1995 01:40:26 +0200 (MET DST)


According to owner-linux-kernel-digest@vger.rutgers.edu:
> From: "Theodore Ts'o" <tytso@mit.edu>
> Date: Wed, 19 Jul 1995 16:22:42 +0200
> Subject: Re: linux-kernel-digest V1 #118
>
> From: linux@pe1chl.ampr.org (Rob Janssen reading Linux mailinglist)
> Date: Tue, 18 Jul 1995 09:18:15 +0200 (MET DST)
>
> Resilience to disk errors certainly isn't Linux's best point...
> I while ago I had some bad sectors on my SCSI disk (which does not have
> automatic re-allocation of those bad sectors), and it was quite difficult
> to recover from that.
>
> It would be nice to allow filesystems to automatically move bad sectors
> to the bad block list, and possibly rewrite a block buffer to a newly
> reallocated block when it is discovered that a newly allocated block is
> bad. This would probably require a callback from the device driver to
> the filesystem layer to notify the filesystem that a particular block is
> bad.

That is all very nice. However, before doing it so sophisticated I
would like to suggest to first make it handle errors more reasonably...
(i.e. don't panic the system or hang processes when a block can't be read)

> It would be nice if you could get the messages printed on a fixed device
> directly by the kernel, so that you could send them to the first console
> (instead of the current one), to a terminal on a serial port, to a
> printer, etc. That would make them less dependent on complex stuff like
> syslogd and X.
>
> This doesn't require kernel changes; you just need to make source
> changes to klogd. This would work in most cases, where process
> scheduling is still working. It certainly works in the disk i/o error
> case mentioned above; you just have to instruct klogd to write kernel
> log messages to a device, instead of or in addition to forwarding the
> kernel message to syslogd.

I'm not sure what klogd is, my system currently runs only syslogd.
Is there an advantage to getting and running klogd? (I don't have it)
Note that when the disk is dead, processes that are not very active and
suddenly need to do something stand a high chance of failure.
(because they have been swapped out)

I know it is against the "do it in user mode" religion, but I really
think that critical error message printing should be done in the kernel,
not in a user process that probably dies with the system.

> In the case where I'm doing kernel work, and where I'm afraid a device
> driver bug that I'm working on might cause the system to hang at the
> interrupt level, i generally avoid working using X11 at all; I'll then
> kill off klogd, or run klogd -c 8, so that all kernel messages go to the
> current console, where I'm guaranteed to get the message even if the
> system is hung.

That only helps when you are doing debugging work.
(even then I would not like to be without X)
Problems with the disks or the SCSI bus can occur at any time, usually
when you don't expect or like it. I feel uneasy with the fact that
I don't see the error messages, and the filesystems are corrupted
beyond the minimum possible extent.
(e.g. an error on /dev/sda1 will corrupt a filesystem on /dev/sda2, only
because you can't sync the system properly before rebooting it)

Sure all kinds of nice solutions are possible, but why can't we report
device errors back to the user program, as all other operating systems
seem to be able to do? (and leave the system in a state where it can
be safely shutdown)

Rob

-- 
+------------------------------------+--------------------------------------+
| Rob Janssen         rob@knoware.nl | AMPRnet:   rob@pe1chl.ampr.org       |
| e-mail: pe1chl@wab-tis.rabobank.nl | AX.25 BBS: PE1CHL@PI8WNO.#UTR.NLD.EU |
+------------------------------------+--------------------------------------+