Re: Bug: After a 'warm' reboot the disk is missing (not detected by the bios) on a HP t640

From: Ben Mesman | Spark Narrowcasting
Date: Thu Dec 28 2023 - 07:40:01 EST


> Please don't send private mails.  Kudos for using get_maintainer.pl, but a demerit
> for not Cc'ing the mailing lists :-)
>
> https://people.kernel.org/tglx/notes-about-netiquette

Definitely saving that in my URLs-cache. Might need it again in a few years :-)
(last time I needed it was about 10 years ago)

> > I recently started upgrading some of my remote managed thin-clients from a
> > 5.15.x kernel to a 6.1.x kernel. When rebooting with the new(er) kernel, the
> > HP t640 clients failed. The problem is that after the warm reboot, the BIOS
> > is unable to locate the internal storage (so it can't boot a valid OS).
> >
> > With some digging around I found that adding "reboot=p" will solve the
> > problem, but because the systems are remote managed, I am unable to add this
> > boot-parameter in any straightforward way.
[snip]
> I'm not familiar with this code (I'm not actually a maintainer/reviewer for this
> code, by default get_maintainer.pl Cc's people that have recently modified the
> file in question), but this looks like a hack to workaround a bug elsewhere.
>
> All of these quirks are obviously workarounds for some kind of bug, but AFAICT
> the quirks are to workaround hardware or firmware bugs, not kernel bugs.  Since
> 5.15.x kernels worked, odds are good a bug was introduced between 5.15 and 6.1,
> i.e. that this is fudging around a kernel bug that can and should be fixed.
>
> Are you able to bisect the kernel between 6.1 and 5.15 to try and pinpoint an
> exact commit that introduced the problem?

That took a few days, but resulted in the following:

4be33cf187036744b4ed84824e7157cfc09c6f4c is the first bad commit
commit 4be33cf187036744b4ed84824e7157cfc09c6f4c
Author: Fred Ai <fred.ai@xxxxxxxxxxxxxx>
Date: Mon Dec 20 20:09:40 2021 -0800

mmc: sdhci-pci-o2micro: Improve card input timing at SDR104/HS200 mode

Card input timing is margin, need to adjust the hold timing of card input.

Signed-off-by: Fred Ai <fred.ai@xxxxxxxxxxxxxx>
Link: https://lore.kernel.org/r/20211221040940.484-1-fred.ai@xxxxxxxxxxxxxx
Signed-off-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx>

drivers/mmc/host/sdhci-pci-o2micro.c | 57 ++++++++++++++++++++++++++++++------
1 file changed, 48 insertions(+), 9 deletions(-)

I'm not sure how this code impacts this device, but it does contain a "HS200 MMC card":
$ dmesg | grep mmc
[ 1.044708] mmc0: emmc 1.8v flag is set, force 1.8v signaling voltage
[ 1.044937] mmc0: SDHCI controller on PCI [0000:01:00.0] using ADMA
[ 2.120632] mmc0: new HS200 MMC card at address 0001
[ 2.122912] mmcblk0: mmc0:0001 hA8aP> 14.7 GiB
[ 2.124810] mmcblk0: p1 p2 p3

I can provide more info on the hardware, which is also available in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1056056

--
Kind regards,
Ben Mesman
ben@xxxxxxxxxxxxxxxxxxxxx