Re: [PATCH v2] x86/kexec: Add EFI config table identity mapping for kexec kernel

From: Tao Liu
Date: Thu Jul 27 2023 - 07:13:32 EST


Hi Ard,

On Mon, Jul 17, 2023 at 11:11 PM Tao Liu <ltao@xxxxxxxxxx> wrote:
>
> On Mon, Jul 17, 2023 at 10:57 PM Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
> >
> > On Mon, 17 Jul 2023 at 15:53, Tao Liu <ltao@xxxxxxxxxx> wrote:
> > >
> > > Hi Borislav,
> > >
> > > On Thu, Jul 13, 2023 at 6:05 PM Borislav Petkov <bp@xxxxxxxxx> wrote:
> > > >
> > > > On Thu, Jun 01, 2023 at 03:20:44PM +0800, Tao Liu wrote:
> > > > > arch/x86/kernel/machine_kexec_64.c | 35 ++++++++++++++++++++++++++----
> > > > > 1 file changed, 31 insertions(+), 4 deletions(-)
> > > >
> > > > Ok, pls try this totally untested thing.
> > > >
> > > > Thx.
> > > >
> > > > ---
> > > > diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
> > > > index 09dc8c187b3c..fefe27b2af85 100644
> > > > --- a/arch/x86/boot/compressed/sev.c
> > > > +++ b/arch/x86/boot/compressed/sev.c
> > > > @@ -404,13 +404,20 @@ void sev_enable(struct boot_params *bp)
> > > > if (bp)
> > > > bp->cc_blob_address = 0;
> > > >
> > > > + /* Check for the SME/SEV support leaf */
> > > > + eax = 0x80000000;
> > > > + ecx = 0;
> > > > + native_cpuid(&eax, &ebx, &ecx, &edx);
> > > > + if (eax < 0x8000001f)
> > > > + return;
> > > > +
> > > > /*
> > > > * Setup/preliminary detection of SNP. This will be sanity-checked
> > > > * against CPUID/MSR values later.
> > > > */
> > > > snp = snp_init(bp);
> > > >
> > > > - /* Check for the SME/SEV support leaf */
> > > > + /* Recheck the SME/SEV support leaf */
> > > > eax = 0x80000000;
> > > > ecx = 0;
> > > > native_cpuid(&eax, &ebx, &ecx, &edx);
> > > >
> > > Thanks a lot for the patch above! Sorry for the late response. I have
> > > compiled and tested it locally against 6.5.0-rc1, though it can pass
> > > the early stage of kexec kernel bootup,
> >
> > OK, so that proves that the cc_blob table access is the culprit here.
> > That still means that kexec on SEV is likely to explode in the exact
> > same way should anyone attempt that.
> >
> >
> > > however the kernel will panic
> > > occasionally later. The test machine is the one with Intel Atom
> > > x6425RE cpu which encountered the page fault issue of missing efi
> > > config table.
> > >
> >
> > Agree with Boris that this seems entirely unrelated.
>
> Agree, I will have a retest based on Boris's suggestions.
>
> >
> > > ...snip...
> > > [ 21.360763] nvme0n1: p1 p2 p3
> > > [ 21.364207] igc 0000:03:00.0: PTM enabled, 4ns granularity
> > > [ 21.421097] pps pps1: new PPS source ptp1
> > > [ 21.425396] igc 0000:03:00.0 (unnamed net_device) (uninitialized): PHC added
> > > [ 21.457005] igc 0000:03:00.0: 4.000 Gb/s available PCIe bandwidth
> > > (5.0 GT/s PCIe x1 link)
> > > [ 21.465210] igc 0000:03:00.0 eth1: MAC: ...snip...
> > > [ 21.473424] igc 0000:03:00.0 enp3s0: renamed from eth1
> > > [ 21.479446] BUG: kernel NULL pointer dereference, address: 0000000000000008
> > > [ 21.486405] #PF: supervisor read access in kernel mode
> > > [ 21.491519] mmc1: Failed to initialize a non-removable card
> > > [ 21.491538] #PF: error_code(0x0000) - not-present page
> > > [ 21.502229] PGD 0 P4D 0
> > > [ 21.504773] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > > [ 21.509133] CPU: 3 PID: 402 Comm: systemd-udevd Not tainted 6.5.0-rc1+ #1
> > > [ 21.515905] Hardware name: ...snip...
> >
> >
> > Why are you snipping the hardware name?
>

Our partner said it is OK to discuss in public, so the hardware is:
Hardware name: LENOVO 11KL0FVT06/3334, BIOS M4XKT14A 05/17/2023

The machine is Lenovo ThinkEdge SE10.

Thanks,
Tao Liu

> Sorry for the inconvenience here... The machine is borrowed from our
> partner, which may not be officially released to the market. I haven't
> discussed the legal issue with them. In addition, I think the stack
> trace is more useful, so I snipped the hardware name. Sorry about
> that...
>
> >