Re: [PATCH v1 3/6] KVM: VMX: Add new ops in kvm_x86_ops for LASS violation check

From: Zeng Guang
Date: Fri Jun 30 2023 - 14:50:59 EST

Next message: Linus Torvalds: "Re: [GIT PULL] RISC-V Patches for the 6.5 Merge Window, Part 1"
Previous message: Rob Herring: "[PATCH] dt-bindings: arm: cpus: Add Cortex-A520 and Cortex-A720"
In reply to: Sean Christopherson: "Re: [PATCH v1 3/6] KVM: VMX: Add new ops in kvm_x86_ops for LASS violation check"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 6/28/2023 2:26 AM, Sean Christopherson wrote:

Similar to comments on an earlier patch, don't try to describe the literal code
change, e.g. this does far more than just "Add new ops in kvm_x86_ops". Something
like

KVM: VMX: Add emulation of LASS violation checks on linear address

OK. I will modify the change log to focus on overview of patch
implementation and be more concise.

On Thu, Jun 01, 2023, Zeng Guang wrote:

Intel introduces LASS (Linear Address Separation) feature providing
an independent mechanism to achieve the mode-based protection.

LASS partitions 64-bit linear address space into two halves, user-mode
address (LA[bit 63]=0) and supervisor-mode address (LA[bit 63]=1). It
stops any code execution or conditional data access[1]
1. from user mode to supervisor-mode address space
2. from supervisor mode to user-mode address space
and generates LASS violation fault accordingly.

[1]A supervisor mode data access causes a LASS violation only if supervisor
mode access protection is enabled (CR4.SMAP = 1) and either RFLAGS.AC = 0
or the access implicitly accesses a system data structure.

Following are the rules of LASS violation check on the linear address(LA).
User access to supervisor-mode address space:
LA[bit 63] && (CPL == 3)
Supervisor access to user-mode address space:
Instruction fetch: !LA[bit 63] && (CPL < 3)
Data access: !LA[bit 63] && (CR4.SMAP==1) && ((RFLAGS.AC == 0 &&
CPL < 3) || Implicit supervisor access)

Add new ops in kvm_x86_ops to do LASS violation check.

Signed-off-by: Zeng Guang <guang.zeng@xxxxxxxxx>
Tested-by: Xuelian Guo <xuelian.guo@xxxxxxxxx>
---
arch/x86/include/asm/kvm-x86-ops.h | 3 +-
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/kvm_emulate.h | 1 +
arch/x86/kvm/vmx/vmx.c | 47 ++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.h | 2 ++
5 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 13bc212cd4bc..8980a3bfa687 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -132,7 +132,8 @@ KVM_X86_OP_OPTIONAL(migrate_timers)
KVM_X86_OP(msr_filter_changed)
KVM_X86_OP(complete_emulated_msr)
KVM_X86_OP(vcpu_deliver_sipi_vector)
-KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
+KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons)
+KVM_X86_OP_OPTIONAL_RET0(check_lass)

Use "is_lass_violation" instead of "check_lass" for all function names. "check"
doesn't convey that the function is a predicate, i.e. that it returns true/false.

That does make sense. Will change it.

#undef KVM_X86_OP
#undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 92d8e65fe88c..98666d1e7727 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1731,6 +1731,8 @@ struct kvm_x86_ops {
* Returns vCPU specific APICv inhibit reasons
*/
unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu);
+
+ bool (*check_lass)(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags);
};
struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
index 5b9ec610b2cb..f1439ab7c14b 100644
--- a/arch/x86/kvm/kvm_emulate.h
+++ b/arch/x86/kvm/kvm_emulate.h
@@ -91,6 +91,7 @@ struct x86_instruction_info {
/* x86-specific emulation flags */
#define X86EMUL_F_FETCH BIT(0)
#define X86EMUL_F_WRITE BIT(1)
+#define X86EMUL_F_SKIPLASS BIT(2)

This belongs in the patch that integrates LASS into the emulator. And rather than
make the flag a command (SKIPLASS), I think it makes sense to make the flag describe
the access. It'll mean more flags, but those are free. That way the originators of
the accesses, e.g. em_invlpg(), don't need to document how they interact with LASS,
e.g. this code is self-documenting, and if someone wants to understand why KVM
has a dedicated X86EMUL_F_INVLPG flag, it's easy enough to find the consumer.
And in the unlikely scenario that other things care about INVLPG, branch targets,
etc., we won't end up with X86EMUL_F_SKIPLASS, X86EMUL_F_SKIPOTHER, etc.

rc = __linearize(ctxt, ctxt->src.addr.mem, &max_size, 1,
X86EMUL_F_INVLPG, ctxt->mode, &linear);

So this?

#define X86EMUL_F_IMPLICIT
#define X86EMUL_F_INVLPG
#define X86EMUL_F_BRANCH_TARGET

By this way, emulator just provides the type of operation and make it to become
common interface open to various platforms. That also conceals vendor specific
implementation details. We will follow this idea.
Thanks.

struct x86_emulate_ops {
void (*vm_bugged)(struct x86_emulate_ctxt *ctxt);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a33205ded85c..876997e8448e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8130,6 +8130,51 @@ static void vmx_vm_destroy(struct kvm *kvm)
free_pages((unsigned long)kvm_vmx->pid_table, vmx_get_pid_table_order(kvm));
}
+/*
+ * Determine whether an access to the linear address causes a LASS violation.
+ * LASS protection is only effective in long mode. As a prerequisite, caller
+ * should make sure vCPU running in long mode and invoke this api to do LASS
+ * violation check.
+ */
+bool vmx_check_lass(struct kvm_vcpu *vcpu, u64 access, u64 la, u32 flags)
+{
+ bool user_mode, user_as, rflags_ac;
+
+ if (!!(flags & X86EMUL_F_SKIPLASS) ||
+ !kvm_is_cr4_bit_set(vcpu, X86_CR4_LASS))
+ return false;
+
+ WARN_ON_ONCE(!is_long_mode(vcpu));

This is silly. By making this a preqreq, 2 of the 3 callers are forced to explicitly
check for is_long_mode(), *and* it unnecessarily bleeds LASS details outside of VMX.

Right. Will give up this implementation.

+
+ user_as = !(la >> 63);
+
+ /*
+ * An access is a supervisor-mode access if CPL < 3 or if it implicitly
+ * accesses a system data structure. For implicit accesses to system
+ * data structure, the processor acts as if RFLAGS.AC is clear.
+ */
+ if (access & PFERR_IMPLICIT_ACCESS) {

Please don't use PFERR_IMPLICIT_ACCESS, just extend the new flags. This is
obviously not coming from a page fault. PFERR_IMPLICIT_ACCESS really shouldn't
exist, but at least there was reasonable justification for adding it (changing
all of the paths that lead to permission_fault() would have require a massive
overhaul).

I've realized it not proper to use "PFERR_xxx" which was defined for
mmu process actually. It easily makes things confuse.

***HOWEVER***

I think the entire approach of hooking __linearize() may be a mistake, and LASS
should instead be implemented in a wrapper of ->gva_to_gpa(). The two calls to
__linearize() that are escaped with SKIPLASS are escaped *because* they don't
actually access memory (branch targets and INVLPG), and so if LASS is enforced
only when ->gva_to_gpa() is invoked, all of these new flags go away because the
cases that ignore LASS are naturally handled.

That should also make it unnecessary to add one-off checks since e.g. kvm_handle_invpcid()
will hit kvm_read_guest_virt() and thus ->gva_to_gpa(), i.e. we won't end up playing
an ongoing game of whack-a-mole.

We've ever thought about this scheme, i.e. doing LASS violation check before
page table walk in ->gva_to_gpa(). But we find it difficult to identify and
process the fault of Lass violation. That may need to change the architecture
design of mmu. Thus we recommend current implementation.

And one question that needs to be answered is what happens when an access rolls
over from supervisor to user, e.g. if the kernel access 8 bytes at -1ull and thus
reads all Fs => 0x6, does the access get a LASS violation on the access to user
memory. User=>supervisor can't happen because non-canonical checks have higher
priority, but supervisor=>user can. And that matters because it will impact
whether or not KVM needs to check each *page* or just the initial address.

By experiment on Simics platform, I verified such kind of access will cause a
LASS violation. KVM has to consider lass violation check at both end of address
range of instruction fetch and data access in emulation. Currently it doesn't
take care of this corner case. Thanks to point out this potential problem.

+ user_mode = false;
+ rflags_ac = false;
+ } else {
+ user_mode = vmx_get_cpl(vcpu) == 3;
+ if (!user_mode)
+ rflags_ac = !!(kvm_get_rflags(vcpu) & X86_EFLAGS_AC);
+ }
+
+ if (user_mode == user_as)
+ return false;
+
+ /*
+ * Supervisor-mode _data_ accesses to user address space
+ * cause LASS violations only if SMAP is enabled.
+ */
+ if (!user_mode && !(access & PFERR_FETCH_MASK))
+ return kvm_is_cr4_bit_set(vcpu, X86_CR4_SMAP) && !rflags_ac;

This is all more complex than it needs to be. Using local bools just so that
the "user_mode == user_as" is common is not a good tradeoff. User addresses have
*significantly* different behavior, lean into that instead of dancing around it.

bool is_lass_violation(struct kvm_vcpu *vcpu, unsigned long addr,
unsigned int flags)
{
const bool is_supervisor_access = addr & BIT_ULL(63);
const bool implicit = flags & X86EMUL_F_IMPLICIT;

bool user_mode, user_as, rflags_ac;

if (!kvm_is_cr4_bit_set(vcpu, X86_CR4_LASS) || !is_long_mode(vcpu))
return false;

/*
* INVLPG isn't subject to LASS, e.g. to allow invalidating userspace
* addresses without toggling RFLAGS.AC. Branch targets aren't subject
* to LASS in order to simplifiy far control transfers (the subsequent
* fetch will enforce LASS as appropriate).
*/
if (flags & (X86EMUL_F_BRANCH_TARGET | X86EMUL_F_INVLPG))
return false;

if (!implicit && vmx_get_cpl(vcpu) == 3)
return is_supervisor_address;

/* LASS is enforced for supervisor access iff SMAP is enabled. */
if (!kvm_is_cr4_bit_set(vcpu, X86_CR4_SMAP))
return false;

/* Like SMAP, RFLAGS.AC disables LASS checks in supervisor mode. */
if (!implicit && (kvm_get_rflags(vcpu) & X86_EFLAGS_AC))
return false;

return !is_supervisor_address;
}

OK. I will refine the implementation further.
Thanks.

+ return true;
+}

Next message: Linus Torvalds: "Re: [GIT PULL] RISC-V Patches for the 6.5 Merge Window, Part 1"
Previous message: Rob Herring: "[PATCH] dt-bindings: arm: cpus: Add Cortex-A520 and Cortex-A720"
In reply to: Sean Christopherson: "Re: [PATCH v1 3/6] KVM: VMX: Add new ops in kvm_x86_ops for LASS violation check"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]