RE: [PATCH 0/2] Fix boot hang issue on Ampere Emag server

From: Justin He
Date: Thu Feb 02 2023 - 05:51:55 EST


Hi Jason

> -----Original Message-----
> From: Jason A. Donenfeld <Jason@xxxxxxxxx>
> Sent: Wednesday, February 1, 2023 2:23 AM
> To: Justin He <Justin.He@xxxxxxx>
> Cc: Ard Biesheuvel <ardb@xxxxxxxxxx>; Huacai Chen <chenhuacai@xxxxxxxxxx>;
> linux-efi@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Alexandru Elisei
> <Alexandru.Elisei@xxxxxxx>
> Subject: Re: [PATCH 0/2] Fix boot hang issue on Ampere Emag server
>
> On Tue, Jan 31, 2023 at 03:21:39PM +0000, Justin He wrote:
> > Hi Ard,
> >
> > > -----Original Message-----
> > > From: Ard Biesheuvel <ardb@xxxxxxxxxx>
> > > Sent: Tuesday, January 31, 2023 3:19 PM
> > > To: Justin He <Justin.He@xxxxxxx>; Jason A. Donenfeld
> > > <Jason@xxxxxxxxx>
> > > Cc: Huacai Chen <chenhuacai@xxxxxxxxxx>; linux-efi@xxxxxxxxxxxxxxx;
> > > linux-kernel@xxxxxxxxxxxxxxx; Alexandru Elisei
> > > <Alexandru.Elisei@xxxxxxx>
> > > Subject: Re: [PATCH 0/2] Fix boot hang issue on Ampere Emag server
> > >
> > > (cc Jason for awareness)
> > >
> > > On Tue, 31 Jan 2023 at 05:04, Jia He <justin.he@xxxxxxx> wrote:
> > > >
> > > > I met a hung task warning and then kernel was hung forever with
> > > > latest kernel on an Ampere Emag server.
> > > >
> > > > The root cause is kernel was hung when invoking an efi rts call
> > > > to set the RandomSeed variable during the booting stage. The
> > > > arch_efi_call_virt call (set_variable) was never returned and then
> > > > caused the
> > > hung task error.
> > > >
> > >
> > > Given that EFI variables work on this platform (as far as I know),
> > > the problem may be that we are calling SetVariable() too early.
> > >
> > > Could you double check whether setting variables works as expected?
> > > You can use efibootmgr -t 10 as root (for example) to set the boot
> > > timeout, and check whether the new value is retained after a reboot
> > > (efibootmgr will print the current value for you)
> > >
> > > Could you also please share the kernel log up until the point where it
> hangs?
> > >
> > The set_variable seems to be ok in 5.19+:
> > root@:~# efibootmgr -t 10
> > BootCurrent: 0000
> > Timeout: 10 seconds
>
> I think what we want to learn is whether efibootmgr -t 10 works in the latest
> RC. If not, it would suggest the issue isn't with the seed setting, but with some
> other unrelated change.
>
> Can you run efibootmgr -t 10 (or whatever) again on a kernel where you've
> commented out these lines in efi.c inside of efisubsys_init():
>
> if (efi_rt_services_supported(EFI_RT_SUPPORTED_SET_VARIABLE))
> execute_with_initialized_rng(&refresh_nv_rng_seed_nb);
>
> -->
>
> // if (efi_rt_services_supported(EFI_RT_SUPPORTED_SET_VARIABLE))
> // execute_with_initialized_rng(&refresh_nv_rng_seed_nb);
>
As your suggested (comment above execute_with_initialized_rng in latest kernel):
The efibootmgr -t X will be hung. Looks like one certain commit before your patch
broke the set_variable efi call. I will dig into the further debug and tell you the result.

---
Cheers,
Justin.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.