Re: [PATCH v12 19/46] x86/kernel: Make the .bss..decrypted section shared in RMP table

From: Tom Lendacky
Date: Tue Jun 14 2022 - 15:02:20 EST


On 6/14/22 11:13, Sean Christopherson wrote:
On Tue, Jun 14, 2022, Tom Lendacky wrote:
On 6/14/22 10:43, Sean Christopherson wrote:
On Tue, Jun 14, 2022, Sean Christopherson wrote:
s/Brijesh/Michael

On Mon, Mar 07, 2022, Brijesh Singh wrote:
The encryption attribute for the .bss..decrypted section is cleared in the
initial page table build. This is because the section contains the data
that need to be shared between the guest and the hypervisor.

When SEV-SNP is active, just clearing the encryption attribute in the
page table is not enough. The page state need to be updated in the RMP
table.

Signed-off-by: Brijesh Singh <brijesh.singh@xxxxxxx>
---
arch/x86/kernel/head64.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 83514b9827e6..656d2f3e2cf0 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -143,7 +143,20 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
if (sme_get_me_mask()) {
vaddr = (unsigned long)__start_bss_decrypted;
vaddr_end = (unsigned long)__end_bss_decrypted;
+
for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
+ /*
+ * On SNP, transition the page to shared in the RMP table so that
+ * it is consistent with the page table attribute change.
+ *
+ * __start_bss_decrypted has a virtual address in the high range
+ * mapping (kernel .text). PVALIDATE, by way of
+ * early_snp_set_memory_shared(), requires a valid virtual
+ * address but the kernel is currently running off of the identity
+ * mapping so use __pa() to get a *currently* valid virtual address.
+ */
+ early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);

This breaks SME on Rome and Milan when compiling with clang-13. I haven't been
able to figure out exactly what goes wrong. printk isn't functional at this point,
and interactive debug during boot on our test systems is beyond me. I can't even
verify that the bug is specific to clang because the draconian build system for our
test systems apparently is stuck pointing at gcc-4.9.

I suspect the issue is related to relocation and/or encrypting memory, as skipping
the call to early_snp_set_memory_shared() if SNP isn't active masks the issue.
I've dug through the assembly and haven't spotted a smoking gun, e.g. no obvious
use of absolute addresses.

Forcing a VM through the same path doesn't fail. I can't test an SEV guest at the
moment because INIT_EX is also broken.

The SEV INIT_EX was a PEBKAC issue. An SEV guest boots just fine with a clang-built
kernel, so either it's a finnicky relocation issue or something specific to SME.

I just built and booted 5.19-rc2 with clang-13 and SME enabled without issue:

[ 4.118226] Memory Encryption Features active: AMD SME

Phooey.

Maybe something with your kernel config? Can you send me your config?

Attached. If you can't repro, I'll find someone on our end to work on this.

I was able to repro. It dies in the cc_platform_has() code, where it is
trying to do an indirect jump based on the attribute (actually in the
amd_cc_platform_has() which I think has been optimized in):

bool cc_platform_has(enum cc_attr attr)
{
ffffffff81002140: 55 push %rbp
ffffffff81002141: 48 89 e5 mov %rsp,%rbp
switch (vendor) {
ffffffff81002144: 8b 05 c6 e9 3a 01 mov 0x13ae9c6(%rip),%eax # ffffffff823b0b10 <vendor>
ffffffff8100214a: 83 f8 03 cmp $0x3,%eax
ffffffff8100214d: 74 25 je ffffffff81002174 <cc_platform_has+0x34>
ffffffff8100214f: 83 f8 02 cmp $0x2,%eax
ffffffff81002152: 74 2f je ffffffff81002183 <cc_platform_has+0x43>
ffffffff81002154: 83 f8 01 cmp $0x1,%eax
ffffffff81002157: 75 26 jne ffffffff8100217f <cc_platform_has+0x3f>
switch (attr) {
ffffffff81002159: 83 ff 05 cmp $0x5,%edi
ffffffff8100215c: 77 21 ja ffffffff8100217f <cc_platform_has+0x3f>
ffffffff8100215e: 89 f8 mov %edi,%eax
ffffffff81002160: ff 24 c5 c0 01 00 82 jmp *-0x7dfffe40(,%rax,8)

This last line is what causes the reset. I'm guessing that the jump isn't
valid at this point because we are running in identity mapped mode and not
with a kernel virtual address at this point.

Trying to see what the difference was between your config and mine, the
indirect jump lead me to check the setting of CONFIG_RETPOLINE. Your config
did not have it enabled, so I set CONFIG_RETPOLINE=y, and with that, the
kernel boots successfully. With retpolines, the code is completely different
around here:

bool cc_platform_has(enum cc_attr attr)
{
ffffffff81001f30: 55 push %rbp
ffffffff81001f31: 48 89 e5 mov %rsp,%rbp
switch (vendor) {
ffffffff81001f34: 8b 05 26 8f 37 01 mov 0x1378f26(%rip),%eax # ffffffff8237ae60 <vendor>
ffffffff81001f3a: 83 f8 03 cmp $0x3,%eax
ffffffff81001f3d: 74 29 je ffffffff81001f68 <cc_platform_has+0x38>
ffffffff81001f3f: 83 f8 02 cmp $0x2,%eax
ffffffff81001f42: 74 33 je ffffffff81001f77 <cc_platform_has+0x47>
ffffffff81001f44: 83 f8 01 cmp $0x1,%eax
ffffffff81001f47: 75 2a jne ffffffff81001f73 <cc_platform_has+0x43>
ffffffff81001f49: 31 c0 xor %eax,%eax
switch (attr) {
ffffffff81001f4b: 83 ff 02 cmp $0x2,%edi
ffffffff81001f4e: 7f 2f jg ffffffff81001f7f <cc_platform_has+0x4f>
ffffffff81001f50: 85 ff test %edi,%edi
ffffffff81001f52: 74 47 je ffffffff81001f9b <cc_platform_has+0x6b>
ffffffff81001f54: 83 ff 01 cmp $0x1,%edi
ffffffff81001f57: 74 5b je ffffffff81001fb4 <cc_platform_has+0x84>
ffffffff81001f59: 83 ff 02 cmp $0x2,%edi
ffffffff81001f5c: 75 08 jne ffffffff81001f66 <cc_platform_has+0x36>
return sev_status & MSR_AMD64_SEV_ENABLED;
ffffffff81001f5e: 8a 05 44 3f 64 01 mov 0x1643f44(%rip),%al # ffffffff82645ea8 <sev_status>
ffffffff81001f64: 24 01 and $0x1,%al
case CC_VENDOR_HYPERV:
return hyperv_cc_platform_has(attr);
default:
return false;
}
}
ffffffff81001f66: 5d pop %rbp
ffffffff81001f67: c3 ret
switch (attr) {
ffffffff81001f68: 83 ff 07 cmp $0x7,%edi
ffffffff81001f6b: 73 06 jae ffffffff81001f73 <cc_platform_has+0x43>
ffffffff81001f6d: 40 f6 c7 01 test $0x1,%dil
ffffffff81001f71: eb 07 jmp ffffffff81001f7a <cc_platform_has+0x4a>
ffffffff81001f73: 31 c0 xor %eax,%eax
}
ffffffff81001f75: 5d pop %rbp
ffffffff81001f76: c3 ret
return attr == CC_ATTR_GUEST_MEM_ENCRYPT;
ffffffff81001f77: 83 ff 02 cmp $0x2,%edi
ffffffff81001f7a: 0f 94 c0 sete %al
}
ffffffff81001f7d: 5d pop %rbp
ffffffff81001f7e: c3 ret
switch (attr) {
.
.
.

I'm not sure if there's a way to remove the jump table optimization for
the arch/x86/coco/core.c file when retpolines aren't configured.

Thanks,
Tom


Thanks!