Re: [PATCH] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernels

From: Kairui Song
Date: Wed Apr 17 2019 - 02:00:55 EST


On Wed, Apr 17, 2019 at 12:57 PM Dave Young <dyoung@xxxxxxxxxx> wrote:
>
> On 04/17/19 at 09:38am, Dave Young wrote:
> > On 04/16/19 at 03:22pm, Borislav Petkov wrote:
> > > On Tue, Apr 16, 2019 at 07:41:33PM +0800, Dave Young wrote:
> > > > On 04/16/19 at 11:52am, Borislav Petkov wrote:
> > > > > I'll queue the below in the next days if there are no more complaints:
> > > >
> > > > As for the kexec breakage, even with the V3 patch, kexec still hangs on
> > > > a Lenovo T420 laptop. Kairui also reproduced the problem. So can we
> > > > wait a few days see if we can make some progress to find the cause?
> > >
> > > How is applying this patch going to change anything?
> > >
> > > I was told that the breakage is there even without it...
> >
> > Without this patch, the bug happens in the efi_get_rsdp.. function, this
> > patch tries to fix that by adding kexec_get.. but the new introduced
> > kexec_* function does not work on some laptops, so it is not a 100% good
> > fix, I hoped we can get it working for all known issues. But if we can
> > not do it eg. within one week we can go with this version and leave the
> > laptop issue as a known issue.
> >
>
> Latest debugging status:
>
> Kexec boot works with commenting out some code like below, so the guid
> cmp (memcmp) caused a system reset), still need to find out why:
>
> diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
> index d9f9abd63c68..13e7a23ae94c 100644
> --- a/arch/x86/boot/compressed/acpi.c
> +++ b/arch/x86/boot/compressed/acpi.c
> @@ -95,10 +95,12 @@ __efi_get_rsdp_addr(unsigned long config_tables, unsigned int nr_tables,
> table = tbl->table;
> }
>
> +/*
> if (!(efi_guidcmp(guid, ACPI_TABLE_GUID)))
> rsdp_addr = table;
> else if (!(efi_guidcmp(guid, ACPI_20_TABLE_GUID)))
> return table;
> +*/
> }
>
> return rsdp_addr;
> @@ -291,9 +293,10 @@ acpi_physical_address get_rsdp_addr(void)
> if (!pa)
> pa = kexec_get_rsdp_addr();
>
> +/*
> if (!pa)
> pa = efi_get_rsdp_addr();
> -
> +*/
> if (!pa)
> pa = bios_get_rsdp_addr();
>
>

Hi Dave, for this case I think it's just because GCC will found the
loop does nothing, and optimize out the whole loop in
__efi_get_rsdp_addr and will no longer read the actual nr_table value.

I can fix the boot error on T420 with your patch, but if I add
anything, like a hardcode value assignment with the right value for
acpi_rsdp in the loop, it will reset the machine. But set acpi_rsdp
with a right initial value out side the loop works fine.
If the loop condition is false, then there should be no difference
between just comment out the line you mentioned and add an assignment.
Else it just assign the value multiple times, not very reasonable but
shouldn't fail.

And, I inspected the generated ASM code also suggest the same thing.
So still, access the systab memory is the cause of the system reset on
certain machines.

--
Best Regards,
Kairui Song