RE: [Regression v5.19-rc1] crash kexec fails to boot the 2nd kernel (Re: [PATCH v12 38/46] x86/sev: Add SEV-SNP feature detection/setup)

From: NOMURA JUNICHI(野村 淳一)
Date: Wed Jun 29 2022 - 03:38:22 EST


From: Michael Roth <michael.roth@xxxxxxx>
> Thanks for the debug info. I haven't been able to reproduce this on the
> Milan or Cascade Lake systems I've tried, with kexec -l/-p, and well as
> with/without -s, so there may be something hardware/environment-specific
> going on here, so I could really use your help to test possible fixes.

Sure. Thank you for trying to reproduce the problem.

> > Other places that parses setup_data uses early_memremap() before
> > accessing the data (e.g. parse_setup_data()). I wonder if the lack of
> > remapping causes the problem but find_cc_blob is too early in the
> > boot process for early_memremap to work.
>
> I think this might be the case. Prior to early_memremap() being
> available, we need to rely on the initialize identity map set up by the
> decompression kernel. It has some stuff to add mappings for boot_params
> and whatnot, but I don't see where boot_params->hdr.setup_data is
> handled.
>
> If you use kexec -s to force kexec_file_load, then the kernel sets it up
> so that boot_params->hdr.setup_data points to some memory just after
> boot_params, and boot/compressed uses 2M pages in its identity map, so
> that generally ends up handling the whole range.
>
> But if you use kexec's default kexec_load functionality, setup_data might
> be allocated elsewhere, so in that case we might need explicit mapping. I
> noticed on my systems boot_params->hdr.setup_data seems to generally end
> up at 0x100000 for some reason, and maybe that addr just happens to
> get mapped for other reasons so I don't end up hitting the crash.
>
> Could you give it a shot with the kexec -s flag and so if that works?

Your explanation makes a lot of sense. I could successfully boot the 2nd
kernel if "kexec -s" is used.

> If so, can you apply the below potential fix, and retry your original
> reproducer?

I tried your potential fix but it didn't work... The symptom was same
as before.

--
Jun'ichi Nomura, NEC Corporation / NEC Solution Innovators, Ltd.

Attachment: smime.p7s
Description: S/MIME cryptographic signature