Re: [RESEND RFC 0/2] Paravirtualized Control Register pinning

From: Andersen, John S
Date: Tue Dec 24 2019 - 16:17:19 EST


On Tue, 2019-12-24 at 22:35 +0200, Liran Alon wrote:
> > On 24 Dec 2019, at 21:44, Andersen, John S <
> > john.s.andersen@xxxxxxxxx> wrote:
> > In em_rsm could we just OR with the value of the PINNED MSRs right
> > before the final return?
>
> Not exactly.
>
> If I understand correctly, the proposed mechanism should also allow
> pinning specific
> system registers (e.g. CR0/CR4) bits to either being cleared or being
> set. Not necessarily being set.
> As kvm_set_cr0() & kvm_set_cr4() were modified to only check if
> pinned bit changed value.
>
> Therefore, you should modify enter_smm() to save current pinned bits
> values
> and then silently restore their values on em_rsm().
>

That's true. Sounds good.

> >
> > Making kexec work will require changes to these files and maybe
> > more:
> >
> > arch/x86/boot/compressed/head_64.S
> > arch/x86/kernel/head_64.S
> > arch/x86/kernel/relocate_kernel_64.S
> >
> > Which my previous attempts showed different results when running
> > virtualized vs. unvirtualized. Specificity different behavior with
> > SMAP
> > and UMIP bits.
>
> I didnât understand why there should be different results when
> running virtualized or not.
>

I think it's either a bug in KVM's enforcement of SMAP and UMIP, or a
bug in the hardware I'm on. If I do the same as bellow but with SMAP
and UMIP, the physical host doesn't boot, but the virtualized host
boots fine and can kexec without issue.

diff --git a/arch/x86/boot/compressed/head_64.S
b/arch/x86/boot/compressed/head_64.S
index 537f7d2bfb17a..dc50634fa674e 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -126,7 +126,7 @@ ENTRY(startup_32)

/* Enable PAE mode */
movl %cr4, %eax
- orl $X86_CR4_PAE, %eax
+ orl $(X86_CR4_PAE | X86_CR4_SMEP), %eax
movl %eax, %cr4

/*
@@ -614,10 +614,10 @@ ENTRY(trampoline_32bit_src)
popl %ecx

/* Enable PAE and LA57 (if required) paging modes */
- movl $X86_CR4_PAE, %eax
+ movl $(X86_CR4_PAE | X86_CR4_SMEP), %eax
cmpl $0, %edx
jz 1f
- orl $X86_CR4_LA57, %eax
+ orl $(X86_CR4_LA57 | X86_CR4_SMEP), %eax
1:
movl %eax, %cr4

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index f3d3e9646a99b..d57ce48a40b5f 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -122,7 +122,7 @@ ENTRY(secondary_startup_64)
1:

/* Enable PAE mode, PGE and LA57 */
- movl $(X86_CR4_PAE | X86_CR4_PGE), %ecx
+ movl $(X86_CR4_PAE | X86_CR4_PGE | X86_CR4_SMEP), %ecx
#ifdef CONFIG_X86_5LEVEL
testl $1, __pgtable_l5_enabled(%rip)
jz 1f
diff --git a/arch/x86/kernel/relocate_kernel_64.S
b/arch/x86/kernel/relocate_kernel_64.S
index b0e558b473f9b..38acd530622cc 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -129,7 +129,7 @@ identity_mapped:
* - physical address extension enabled
* - 5-level paging, if it was enabled before
*/
- movl $X86_CR4_PAE, %eax
+ movl $(X86_CR4_PAE | X86_CR4_SMEP), %eax
testq $X86_CR4_LA57, %r13
jz 1f
orl $X86_CR4_LA57, %eax


I realize this isn't the right way to do this. But it was just my first
attempt at it. We should probably just change it to be something along
the lines of, if the bit was already set, keep it set. I probably just
need to go further down that road.


> What I suggested is to just add metadata in a vmlinux ELF section
> that will be designated to
> describe vmlinux handover capabilities.
>
> Then, kexec functionality can be modified to read this section before
> loading & jumping
> to new kernel vmlinux.
>
> In the context of this patch-set, this section will specify a flag of
> if new vmlinux supports
> CR0/CR4 pinning hand-over. If not and current kernel already pinned
> these values,
> kexec should just return failure.
>
> Just for example, in the context of IOMMU DMA protection hand-over,
> kexec will
> use another flag to new of new vmlinux supports IOMMU DMA protection
> hand-over,
> and if not, make sure to disable IOMMU before jumping to new vmlinux.
>

Ah okay I'm sorry I misunderstood.

> > This would be a longer process though. As validating that
> > everything
> > still works in both the VM and on physical hosts will be required.
> > As
> > it stands this patchset could pick up a fairly large userbase via
> > the
> > virtualized container projects. Should we pursue kexec in this
> > patchset
> > or a later one?
>
> In my opinion, this should be handled as part of this patch-series.
> Otherwise, you basically introduce a kernel change that breaks
> existing functionality
> in unpredictable manner to user.
>
> i.e. Itâs ok of kernel would havenât allowed to use system registers
> pinning functionality
> unless user also configured at boot-time that it disables kexec. But
> itâs not ok for
> kernel behaviour to change such that kexec suddenly crashes if kernel
> was upgraded
> to now use system registers pinning.
>
> My personal opinion though is that kexec should first be enhanced to
> read hand-over metadata
> as I described. And only then apply this patch-series which also
> modifies kexec to define and use
> one of the hand-over bits as I mentioned above. But I would like to
> hear additional opinions on this.
>

I will check this out. Thanks.

> In addition, I would like to understand Linux security maintainers
> point of view on the comment
> I mentioned in another reply regarding how this mechanism is
> implemented in Hyper-V.
>

I agree. I did see Hyper-Vs interface definitions and didn't understand
it to be that simple. It seemed like there was a significant amount of
pre-existing hypercall infrastructure that it was a part of. My
understanding was that we'd have to replicate that, and then only
implement the subset that we care about. It seemed like overkill to get
this enabled. The goal here has been to get this CR protection in as a
simple interface. It would of course be nice to have all those
protections as well though.

Thanks,
John