init bug / crash recovery

Alex Krimkevich (alex@magneton-ra.swmed.edu)
Wed, 13 Mar 1996 16:17:13 GMT


Hi everybody,

This message is intended to report a SERIOUS bug (probably in init),
and a REALLY STUPID oversight on the part of the Slackware maintainers.
When coupled together, these two may cause the irrecoverable
corruption of the hard drive (they sure did it to mine).

I was running Slackware 3.0 under 1.3.2 kernel with the original libc
upgraded to libc.so.5.2.16. All other things were installed right from
the distribution cdrom. It was running on P5 system with ATI Mach 64
graphics card and Adaptec 2940 SCSI adapter. My initdefault in inittab
is 4, meaning we are running xdm, in case it matters.

I needed to bring my system into the single user mode, so, I typed 'su'
in the xterm window, logged in as root, and typed 'telinit 1'. After
having finished with the single user mode, I typed 'reboot', but the
system never came to life. It was failing in the late stages of
kernel bootstrap, apparently being unable to mount the root partition,
which got corrupted along with the rest of the partitions.

Changing run levels, while running X, was never a problem with either
Solaris or DEC OSF, so I was extremely annoyed, but not worried, since
I had the boot and root floppies. I thought naively that after the booting
up from the boot/root floppies, I'd be able to run e2fsck on the file
systems and everything would be dandy after that.

Well, as it turns out no e2fsck (or any other fsck) exists on the root
floppy. I mounted root partition on /mnt and tried '/mnt/sbin/e2fsck'.
The system answered "/mnt/sbin/e2fsck not found". 'ls -l
/mnt/sbin/e2fsck' produced:
-rwxr-x--- 1 root bin 69252 Aug 8 1995 e2fsck*

I tried to set PATH variable, I did '(cd /mnt/sbin;./e2fsck)', I did
'(cp /mnt/sbin/e2fsck /;./e2fsck), and many other things. No luck,
however, the standard response was " ./e2fsck not found". The same
happens if one tries any other command, which does not live on the root
floppy. I tried all hard drive partitions in that regard. "Live" /usr
provided on the installation cdrom is no help in this situation, because
most important system administration commands are in /sbin.

Any other Unix I've worked with provides some way to recover from disk
corruption other than reinstalling the entire system from the
installation medium. I do not see any reason why Linux should be an
exception.

The last one, not the least one,if there is a reason why
telinit [0123456sS] should not be used, at least under X, it should
be reflected in the documentation in a very explicit way. I guess
something like :

"WARNING : YOU CAN MESS UP YOUR COMPUTER BY USING THIS COMMAND"

in the very beginning of telinit/init man page would be very appropriate.
I seriously doubt, that I am the first one who is writing about this
problem. Most likely it has been posted to one of the Linux newsgroups.
The problem is that the people who do not have troubles with there
systems tend not to read Linux newsgroups on the regular basis. And
I am sure those are the majority. The only way to keep these folks
posted about the well known problems is through the warnings in the man
pages. A file, let's say DO_NOT_DO_IT, with a list of things not to
try in /root or /etc would help too. Or better yet, along with the
"Welcome to Linux" mail, the user should be receiving a warning
message. And I am not talking about things like "Do not do 'rm *'
while root",I am talking about the things that behave fundamentally
different under Linux as compared to commercial Unices.

Alex Krimkevich,
alex@magneton-ra.swmed.edu