Re: [PATCH] x86/kernel: Validate ROM before DMI scanning when SEV-SNP is active

From: Michael Roth
Date: Fri Feb 16 2024 - 17:50:53 EST


On Tue, Feb 13, 2024 at 03:10:46PM -0800, Kevin Loughlin wrote:
> On Tue, Feb 13, 2024 at 12:03 PM Michael Roth <michael.roth@xxxxxxx> wrote:
> >
> > Quoting Kevin Loughlin (2024-02-12 22:07:46)
> > > SEV-SNP requires encrypted memory to be validated before access. The
> > > kernel is responsible for validating the ROM memory range because the
> > > range is not part of the e820 table and therefore not pre-validated by
> > > the BIOS.
> > >
> > > While the current SEV-SNP code attempts to validate the ROM range in
> > > probe_roms(), this does not suffice for all existing use cases. In
> > > particular, if EFI_CONFIG_TABLES are not enabled and
> > > CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK is set, the kernel will
> > > attempt to access the memory at SMBIOS_ENTRY_POINT_SCAN_START (which
> > > falls in the ROM range) prior to validation. The specific problematic
> > > call chain occurs during dmi_setup() -> dmi_scan_machine() and results
> > > in a crash during boot if SEV-SNP is enabled under these conditions.
> >
> > AFAIK, QEMU doesn't actually include any legacy ROMs as part of the initial
> > encrypted guest image, and I'm not aware of any VMM implementations that
> > do this either.
>
> I'm using a VMM implementation that uses (non-EFI) Oak stage0 firmware [0].
>
> [0] https://github.com/project-oak/oak/tree/main/stage0_bin
>
> > If dmi_setup() similarly scans these ranges, it seems likely the same
> > issue would be present: the validated/private regions would only contain
> > ciphertext rather than the expected ROM data. Does that agree with the
> > behavior you are seeing?
> >
> > If so, maybe instead probe_roms should just be skipped in the case of SNP?
>
> If probe_roms() is skipped, SEV-SNP guest boot also currently crashes;
> I just quickly tried that (though admittedly haven't looked into why).

default_find_smp_config() will also call smp_scan_config() on
0xF0000-0x10000, so that might be the additional issue you're hitting.
If I skip that for in addition to probe_roms, then boot works for me.

The dmi_setup() case you hit would also need similar handling if taking
this approach.

> Apparently though, the fix for early ROM range accesses is not as
> simple as just skipping probe_roms() if SEV-SNP is enabled.
> Furthermore, skipping probe_roms() was also *not* the route taken in
> the initial attempt that prevents this issue for EFI use cases [1].
>
> [1] https://lore.kernel.org/lkml/20220307213356.2797205-21-brijesh.singh@xxxxxxx/

It seems the currently handling has a bug that has been in place since the
original SEV guest code was added. If you dump the data that probe_roms()
sees while it is scanning for instances of ROMSIGNATURE (0xaa55) in the
region, you'll see that it is random data that changes on every boot.
The root issue is that this region does not contain encrypted data, and
is only being accessed that way because the early page table has the
encryption bit set for this range.

The effects are subtle: if the code ever sees a pair of bytes that look
like ROMSIGNATURE, it will reserve that memory so it can be accessed
later, generally just 0xc0000-0xc7fff. In extremely rare cases where the
ciphertext's data has a checksum that happens to match the contents, it
will use a random byte, multiple it by 512, and reserve up to 64k for
this bogus ROM region.

For SNP this resulted in a more obvious failure: a #VC exception because
the supposedly encrypted memory was in fact not encrypted, and thus not
PVALIDATED. Unfortunately the fix you linked to involved maintaining the
broken SEV behavior rather than fixing this mismatch.

>
> > And perhaps dmi_setup() should similarly skip the legacy ROM ranges for
> > the kernel configs in question?
>
> Given (a) non-EFI firmware is supported in other SME/SEV boot code
> patches [2], (b) this patch does not seem to introduce significant
> complexity (it just moves [1] to earlier in the boot process to
> additionally handle the non-EFI case), and (c) skipping
> probe_roms()+dmi_setup() doesn't work without additional changes, I'm
> currently still inclined to simply validate the legacy ROM ranges
> early enough to prevent this issue (as is already done when using EFI
> firmware).

The 2 options I see are:

a) Skipping accesses to these regions for SEV. It is vaguely possible
some implementation out there actually did measure/load the ROM as
part of the initial guest image for SEV, but for SNP this would
have been impossible since it would have lead to the guest crashing
when snp_prep_roms() was called, since RMPUPDATE on the host only
rescinds the validated bit if there is a change to the RMP entry.
If it was already assigned/private/validated then the guest code
would detected that PVALIDATE resulted in no changes, and so it
would have failed with PVALIDATE_FAIL_NOUPDATE. So if you want to
be super sure you don't break legacy SEV implementations then you
could limit the change to SNP guests where it's essentially
guaranteed these regions are not being utilized in any functional
way.

b) Modifying the early page table setup by early_make_pgtable() to
clear the encrypted bit for 0xC0000-0x100000 legacy region. The
challenge there is everything is PMD-mapped at that stage of boot
and there's no infrastructure for splitting page tables to handle
non-2MB-aligned/sized regions.

But I don't think continuing to propagate the broken SEV behavior is
the right fix. At some point those random scans may trigger something
more problematic than wasted memory reservations. It may even be the
case already since I haven't audited the dmi_setup()/smp_scan_config()
paths yet, but nothing good/useful can come of it.

-Mike

>
> [2] https://lore.kernel.org/lkml/CAMj1kXFZKM5wU8djcVBxDmnCJwV4Xpest6u1EbE=7wyLUUeUUQ@xxxxxxxxxxxxxx/