Re: [PATCH v2] x86/kexec: Add EFI config table identity mapping for kexec kernel

From: Tao Liu
Date: Thu Jul 27 2023 - 07:04:53 EST


Hi Borislav,

Sorry for the late response. I spent some time retesting your patch
against 6.5.0-rc1 and 6.5.0-rc3, and it is OK. So

Reported-and-tested-by: Tao Liu <ltao@xxxxxxxxxx>

And will we use this patch as a workaround or will we wait for a
better solution as proposed by Michael?

On Mon, Jul 17, 2023 at 10:14 PM Borislav Petkov <bp@xxxxxxxxx> wrote:
>
> On Mon, Jul 17, 2023 at 09:53:06PM +0800, Tao Liu wrote:
> > ...snip...
> > [ 21.360763] nvme0n1: p1 p2 p3
> > [ 21.364207] igc 0000:03:00.0: PTM enabled, 4ns granularity
> > [ 21.421097] pps pps1: new PPS source ptp1
> > [ 21.425396] igc 0000:03:00.0 (unnamed net_device) (uninitialized): PHC added
> > [ 21.457005] igc 0000:03:00.0: 4.000 Gb/s available PCIe bandwidth
> > (5.0 GT/s PCIe x1 link)
> > [ 21.465210] igc 0000:03:00.0 eth1: MAC: ...snip...
> > [ 21.473424] igc 0000:03:00.0 enp3s0: renamed from eth1
> > [ 21.479446] BUG: kernel NULL pointer dereference, address: 0000000000000008
> > [ 21.486405] #PF: supervisor read access in kernel mode
> > [ 21.491519] mmc1: Failed to initialize a non-removable card
> > [ 21.491538] #PF: error_code(0x0000) - not-present page
> > [ 21.502229] PGD 0 P4D 0
> > [ 21.504773] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [ 21.509133] CPU: 3 PID: 402 Comm: systemd-udevd Not tainted 6.5.0-rc1+ #1
> > [ 21.515905] Hardware name: ...snip...
> > [ 21.522851] RIP: 0010:kernfs_dop_revalidate+0x2b/0x120
>
> So something's weird here - my patch should not cause a null ptr deref
> here.

The random kernel panic I encountered is irrelevant to this patch, and
I'm pretty sure it is caused by some driver, highly suspicious it is
the graphic card driver i915.ko. When I apply this patch, 1)
disconnect the monitor(using serial port to login) and keep i915, or
2) connect the monitor but remove i915, the kexec kernel will not
fail. Though i915 has provided a pci shutdown function, maybe some bug
triggered and caused memory overwrite after kexec. Anyway, it is
another issue.

Thanks,
Tao Liu

>
> > [ 21.527995] Code: 1f 44 00 00 83 e6 40 0f 85 07 01 00 00 41 55 41
> > 54 55 53 48 8b 47 30 48 89 fb 48 85 c0 0f 84 a2 00 00 00 48 8b a87
>
> This looks weird too. There's no "<>" brackets denoting which byte it
> was exactly where RIP pointed to when the NULL ptr happened.
>
> Do
>
> make fs/kernfs/dir.s
>
> and upload dir.s and the dir.o file somewhere.
>
> In any case, my patch shouldn't be causing this. At least I don't see
> it.
>
> I'm testing a better version of the patch and it should not cause this
> thing even less.
>
> > The stack trace may not be the same all the time, I didn't dive deep
> > into the root cause, but it looks to me the patch will cause an
> > unknown issue. Also I tested the patch on kernel-5.14.0-318.el9, it
>
> This is the upstream kernel mailing list so those Frankenstein kernels
> are all left to you.
>
> Good luck. :-)
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>