Re: USB storage no-boot regression (bisected)

From: Jeff Garzik
Date: Tue Apr 14 2009 - 22:36:25 EST


Greg KH wrote:
On Tue, Apr 14, 2009 at 05:06:14PM -0400, Jeff Garzik wrote:
Once of the x86-64 machines I use for testing runs off of two 2GB USB flash drives, one for Fedora 10 userland, and one for kernel repository + builds.

It boots correctly in 2.6.27, but fails with the same symptoms in 2.6.28, 2.6.29 and 2.6.30-rc1:

1) The kernel boots
2) After time passes, kernel begins executing initramfs
userland
3) the kernel prints out probe messages for the USB keyboard,
SCSI probe messages for the two USB flash drives

Or IOW, the keyboard and two SCSI drives appear after initramfs begins booting. And this is for drivers built into the kernel (though same behavior with modules).

This no-boot regression is 100% reproducible, and neatly bisects down to

commit 8520f38099ccfdac2147a0852f84ee7a8ee5e197
Author: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>
Date: Mon Sep 22 14:44:26 2008 -0400

USB: change hub initialization sleeps to delayed_work
This patch (as1137) changes the hub_activate() routine, replacing the
power-power-up and debounce delays with delayed_work calls. The idea
is that on systems where the USB stack is compiled into the kernel
rather than built as modules, these delays will no longer block the
boot thread. At least 100 ms is saved for each root hub, which can
add up to a significant savings in total boot time.
Arjan van de Ven was very pleased to see that this shaved 700 ms off
his computer's boot time. Since his total boot time is on the order
of two seconds, the improvement is considerable.
Signed-off-by: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>
Tested-by: Arjan van de Ven <arjan@xxxxxxxxxxxxx>
Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxx>

My preliminary guess is that this made things --too-- asynchronous, and for some reason userland begins executing before the SCSI core initializes the USB storage as Linux block devices.

In any case, I cannot boot because of the above commit :)

Like Arjan said, this is because we are initializing faster now, and
things are a bit more asynchronous. Use the root_delay boot option,
that's what I use for my USB-based systems, and have not had a problem
with that at all.

Is that solution really scalable to every user with a regression severe enough it prevents them from booting?

When did regressions become an acceptable tradeoff for speed?

This system boots just fine under kernel 2.6.27, 2.6.26, 2.6.25, and so on. Switch the kernel to 2.6.28, and it no longer boots. A regression cannot get more clear than that.

Maybe this commit should have been accompanied by one that checks "root=" ?

Jeff




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/