Re: A bug that allows you to build a kernel that will not boot

Guest section DW (dwguest@win.tue.nl)
Fri, 5 Feb 1999 12:44:28 +0100 (MET)


From: Reg Clemens <reg@dwf.com>

I believe that this bug has existed for some time

Let us say `flaw'.

PROBLEM:
If one sets the "Solaris (x86) partition table support" option from the
partion sub-menue in xconfig (which sets the define
CONFIG_SOLARIS_X86_PARTITION) then one ends up with a kernel that will not
boot without significant fiddling.

More generally:

PROBLEM:
If one adds or removes partitions or disks then the numbering of the
remaining partitions and/or disks changes.

---

For reference, here is how hda is partitioned

hda1 Win98
hda2 FreeBSD
hda3 Solaris
hda4 Extended
hda5 (unused)
hda6 Linux
hda7 Linux swap

During a `normal' boot I see the messages

...
Partition Check:
hda: hda1 hda2! hda3 hda4 < hda5 hda6 hda7 > < hda8 hda9 hda10 hda11 >
hdc: [PTBL] [778/255/63] hdc3
VFS: Mounted root (ext2 filesystem) readonly.
...

Right off I have two questions:
(a) what does the `!' on hda2 imply?
(b) where do the bogus partitions hda8 - 11 come from?

Hmm. Let us look at the source - genhd.c.
The ! indicates a *BSD partition (when CONFIG_BSD_DISKLABEL is defined)

So, you have primary partitions hda1-hda4.
Looking inside the extended partition the kernel finds hda5-7.
Looking inside the FreeBSD partition the kernel finds hda8-11.

If I invoke the Solaris options above, then at the same point in the boot
I see

Partition Check:
hda: hda1 hda2! hda3 <Solaris: [s0] hda5 [s1] hda6 [s2] hda7 [s3] hda8
[s4] hda9 [s6] hda10 [s7] hda10 >
hda4 < hda12 hda13 hda14 > < hda15 hda16 hda17 hda18 >
hdc: [PTBL] [778/255/63] hdc3
Kernel panic: VFS: Unable to mount root fs on 03:06

So now it is showing me my SOLARIS (sub)partitions.
So a question
(c) Why does it show my SOLARIS partions and not my FreeBSD partitions?
Yes I have BSD Disklabel support turned on.
(d) note that there are now bogus partitions hda15-18.

But (d) is the answer to (c): these FreeBSD partitions that you
cannot find are the hda15-18 that you complain about.

So, here: you have primary partitions hda1-hda4.
Looking inside the Solaris partition the kernel finds hda5-10.
Looking inside the extended partition the kernel finds hda12-14.
Looking inside the FreeBSD partition the kernel finds hda15-18.

If I CHANGE /etc/lilo.conf to contain `root=/dev/hda13' to override the
incorrect info in the kernel, I get a few lines further in the boot.
If I then ALSO change /etc/fstab to point to /dev/hda13 (rather than
/dev/hda6) and /dev/hda14 (rather than /hda7) for swap, then I can in fact
boot the kernel.
At that point I CAN mount /dev/hda5 and get to my SOLARIS root file system.

All entirely as expected.

First conclusion: if you enable/disable CONFIG_*_PARTITION then you
must be prepared to handle a change in partition numbers.

No doubt it is less than optimal that the CONFIG_BSD_DISKLABEL implementer
was careful to place FreeBSD slices at the end of the partition list
while the CONFIG_SOLARIS_X86_PARTITION implementer failed to do so.
Probably we should change this, so that at least the `basic' partitions
are numbered independently of whether we look inside `foreign' partitions.

One could also imagine entirely different schemes, where FreeBSD slices
do not live in the same name space as DOS-type partitions.
This is perhaps easiest done when we finally get large device numbers.

Andries

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/