Re: kernel 6.2 stuck at boot (efi_call_rts) on arm64

From: Andrea Righi
Date: Thu Apr 13 2023 - 16:24:54 EST


On Sat, Mar 18, 2023 at 11:35:44AM +0100, Ard Biesheuvel wrote:
> On Thu, 16 Mar 2023 at 23:28, Darren Hart <darren@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Thu, Mar 16, 2023 at 07:55:36PM +0100, Ard Biesheuvel wrote:
> > > On Thu, 16 Mar 2023 at 18:52, Andrea Righi <andrea.righi@xxxxxxxxxxxxx> wrote:
> ...
> > > >
> > > > Yay! Success! I just tested your latest efi/urgent (with the fixup) and
> > > > system completed the boot without any soft lockups.
> > > >
> > >
> > > Thanks for confirming. I'll take that as a tested-by
> >
> > The solution in the current branch looks like the best approach we have to date
> > to address the broadest of affected systems. We could switch the eMAG test to an
> > MIDR test I believe (but this won't work for Altra as that would capture all the
> > Neoverse v1 cores beyond Altra). I can look into the MIDR test if you think it's
> > worthwhile - but since I don't think we can eliminate the SMBIOS string test, it
> > doesn't buy us much since we don't need a greedier eMAG test (there aren't more
> > of them to match).
> >
> > Given that some OEM Altra platforms change the processor ID, I don't see a
> > better solution currently than adding their the "product name" to the smbios
> > string tests unfortunately.
> >
>
> Indeed. I spotted a Gigabyte system [0] with a different processor ID,
> but with a version we can test for.
>
> So for now, I'll go with
>
> socid = (u32 *)record->processor_id;
> switch (*socid & 0xffff000f) {
> static char const altra[] = "Ampere(TM) Altra(TM) Processor";
> static char const emag[] = "eMAG";
> default:
> version = efi_get_smbios_string(&record->header, 4,
> processor_version);
> if (!version || (strncmp(version, altra, sizeof(altra) - 1) &&
> strncmp(version, emag, sizeof(emag) - 1)))
> break;
>
> fallthrough;
>
> case 0x0a160001: // Altra
> case 0x0a160002: // Altra Max
> efi_warn("Working around broken SetVirtualAddressMap()\n");
> ...
>
> which should cover all the affected systems we encountered so far.
>
> I'll push this to linux-next to let it soak for a little bit, and then
> send it to Linus somewhere during the week
>
> Thanks,
> Ard.
>
>
> [0] https://pastebin.com/HQLE1yYv

Not sure if it's a similar issue, but I have found another Ampere box
that is booting fine with your fixes, but the eifvars.sh kselftest is
failing with some I/O errors, specifically:

$ sudo ./efivarfs.sh
--------------------
running test_create
--------------------
./efivarfs.sh: line 58: printf: write error: Input/output error
/sys/firmware/efi/efivars/test_create-210be57c-9849-4fc7-a635-e6382d1aec27 has invalid size
[FAIL]
--------------------
running test_create_empty
--------------------
[PASS]
--------------------
running test_create_read
--------------------
[PASS]
--------------------
running test_delete
--------------------
./efivarfs.sh: line 103: printf: write error: Input/output error
[PASS]
--------------------
running test_zero_size_delete
--------------------
./efivarfs.sh: line 126: printf: write error: Input/output error
./efivarfs.sh: line 134: printf: write error: Input/output error
/sys/firmware/efi/efivars/test_zero_size_delete-210be57c-9849-4fc7-a635-e6382d1aec27 should have been deleted
[FAIL]
--------------------
running test_open_unlink
--------------------
open(O_WRONLY): Operation not permitted
[FAIL]
--------------------
running test_valid_filenames
--------------------
./efivarfs.sh: line 158: printf: write error: Input/output error
./efivarfs.sh: line 158: printf: write error: Input/output error
./efivarfs.sh: line 158: printf: write error: Input/output error
./efivarfs.sh: line 158: printf: write error: Input/output error
[PASS]
--------------------
running test_invalid_filenames
--------------------
[PASS]

If it helps:

$ sudo hexdump -C /sys/firmware/dmi/entries/4-0/raw
00000000 04 30 04 00 01 03 fe 02 c1 d0 3f 41 00 00 00 00 |.0........?A....|
00000010 03 8a 72 06 b8 0b f0 0a 41 06 05 00 06 00 07 00 |..r.....A.......|
00000020 04 05 06 50 50 50 04 00 01 01 01 00 01 00 01 00 |...PPP..........|
00000030 43 50 55 20 31 00 41 6d 70 65 72 65 28 52 29 00 |CPU 1.Ampere(R).|
00000040 41 6d 70 65 72 65 28 52 29 20 41 6c 74 72 61 28 |Ampere(R) Altra(|
00000050 52 29 20 50 72 6f 63 65 73 73 6f 72 00 30 30 30 |R) Processor.000|
00000060 30 30 30 30 30 30 30 30 30 30 30 30 30 30 32 35 |0000000000000025|
00000070 35 30 32 30 39 30 33 33 38 36 35 42 34 00 30 30 |50209033865B4.00|
00000080 30 30 30 30 30 31 00 51 38 30 2d 33 30 00 00 |000001.Q80-30..|
0000008f

I guess EFI is not very reliable here...

-Andrea