Re: [PATCH 0/6] efi/x86: Avoid legacy decompressor during EFI boot

From: Tom Lendacky
Date: Tue May 02 2023 - 12:08:59 EST


On 5/2/23 08:39, Ard Biesheuvel wrote:
On Tue, 2 May 2023 at 15:37, Tom Lendacky <thomas.lendacky@xxxxxxx> wrote:

On 4/24/23 11:57, Ard Biesheuvel wrote:
This series is conceptually a combination of Evgeny's series [0] and
mine [1], both of which attempt to make the early decompressor code more
amenable to executing in the EFI environment with stricter handling of
memory permissions.

My series [1] implemented zboot for x86, by getting rid of the entire
x86 decompressor, and replacing it with existing EFI code that does the
same but in a generic way. The downside of this is that only EFI boot is
supported, making it unviable for distros, which need to support BIOS
boot and hybrid EFI boot modes that omit the EFI stub.

Evgeny's series [0] adapted the entire decompressor code flow to allow
it to execute in the EFI context as well as the bare metal context, and
this involves changes to the 1:1 mapping code and the page fault
handlers etc, none of which are really needed when doing EFI boot in the
first place.

So this series attempts to occupy the middle ground here: it makes
minimal changes to the existing decompressor so some of it can be called
from the EFI stub. Then, it reimplements the EFI boot flow to decompress
the kernel and boot it directly, without relying on the trampoline code,
page table code or page fault handling code. This allows us to get rid
of quite a bit of unsavory EFI stub code, and replace it with two clear
invocations of the EFI firmware APIs to clear NX restrictions from
allocations that have been populated with executable code.

The only code that is being reused is the decompression library itself,
along with the minimal ELF parsing that is required to copy the ELF
segments in place, and the relocation processing that fixes up absolute
symbol references to refer to the correct virtual addresses.

Note that some of Evgeny's changes to clean up the PE/COFF header
generation will still be needed, but I've omitted those here for
brevity.

I tried booting an SEV and an SEV-ES guest using this and both failed to boot:

EFI stub: WARNING: Decompression failed: Out of memory while allocating
z_stream

I'll have to take a closer look as to why, but it might be a couple of
days before I can get to it.


Thanks Tom.

The internal malloc() seems to be failing, which is often caused by
BSS clearing problems. Could you elaborate a little bit on the boot
environment you are using here?

I'm using Qemu v7.2.1 as my VMM, Linux 6.3 with your series applied for my
host/hypervisor and guest kernel and the current OVMF tree built using
OvmfPkgX64.dsc.

I was originally using the current merge window Linux, but moved to the
release version just to . With the release version SEV and SEV-ES still fail to
boot, but SEV actually #GPs now. And some of the register contents look
like encrypted data:

ConvertPages: range 1000000 - 4FA1FFF covers multiple entries
!!!! X64 Exception Type - 0D(#GP - General Protection) CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000000
RIP - 00000000597E71C1, CS - 0000000000000038, RFLAGS - 0000000000210206
RAX - 1FBA02A45943B920, RCX - 0000000000AF7009, RDX - A9DAE761B64A1F1B
RBX - 1FBA02A45943B8C0, RSP - 000000007FD97320, RBP - 0000000002000000
RSI - 0000000001000000, RDI - 1FBA02A45943DE68
R8 - 0000000003EF3C94, R9 - 0000000000000000, R10 - 000000007D7C6018
R11 - 0000000000000000, R12 - 0000000001000000, R13 - 00000000597EDD98
R14 - 0000000001000000, R15 - 000000007E0A5198
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080010033, CR2 - 0000000000000000, CR3 - 000000007FA01000
CR4 - 0000000000000668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 000000007F7DC000 0000000000000047, LDTR - 0000000000000000
IDTR - 000000007F34C018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 000000007FD96F80
!!!! Find image based on IP(0x597E71C1) /root/kernels/ovmf-build-X64/Build/OvmfX64/DEBUG_GCC5/X64/MdeModulePkg/Universal/Variable/RuntimeDxe/VariableRuntimeDxe/DEBUG/Variable
RuntimeDxe.dll (ImageBase=0000000000D4792C, EntryPoint=0000000000D50CC3) !!!!

So, yes, probably an area of memory that was zeroes when mapped
unencrypted, but wasn't cleared after changing the mapping to
encrypted.

Thanks,
Tom