Re: [PATCH 00/23] KVM: MMU: MMU role refactoring

From: Sean Christopherson
Date: Wed Feb 09 2022 - 20:57:48 EST


On Mon, Feb 07, 2022, David Matlack wrote:
> On Mon, Feb 7, 2022 at 3:27 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > > What do you think about calling this the guest_role instead of cpu_role?
> > > There is a bit of a precedent for using "guest" instead of "cpu" already
> > > for this type of concept (e.g. guest_walker), and I find it more
> > > intuitive.
> >
> > Haven't looked at the series yet, but I'd prefer not to use guest_role, it's
> > too similar to is_guest_mode() and kvm_mmu_role.guest_mode. E.g. we'd end up with
> >
> > static union kvm_mmu_role kvm_calc_guest_role(struct kvm_vcpu *vcpu,
> > const struct kvm_mmu_role_regs *regs)
> > {
> > union kvm_mmu_role role = {0};
> >
> > role.base.access = ACC_ALL;
> > role.base.smm = is_smm(vcpu);
> > role.base.guest_mode = is_guest_mode(vcpu);
> > role.base.direct = !____is_cr0_pg(regs);
> >
> > ...
> > }
> >
> > and possibly
> >
> > if (guest_role.guest_mode)
> > ...
> >
> > which would be quite messy. Maybe vcpu_role if cpu_role isn't intuitive?
>
> I agree it's a little odd. But actually it's somewhat intuitive (the
> guest is in guest-mode, i.e. we're running a nested guest).
>
> Ok I'm stretching a little bit :). But if the trade-off is just
> "guest_role.guest_mode" requires a clarifying comment, but the rest of
> the code gets more readable (cpu_role is used a lot more than
> role.guest_mode), it still might be worth it.

It's not just guest_mode, we also have guest_mmu, e.g. we'd end up with

vcpu->arch.root_mmu.guest_role.base.level
vcpu->arch.guest_mmu.guest_role.base.level
vcpu->arch.nested_mmu.guest_role.base.level

In a vacuum, I 100% agree that guest_role is better than cpu_role or vcpu_role,
but the term "guest" has already been claimed for "L2" in far too many places.

While we're behind the bikeshed... the resulting:

union kvm_mmu_role cpu_role;
union kvm_mmu_page_role mmu_role;

is a mess. Again, I really like "mmu_role" in a vacuum, but juxtaposed with

union kvm_mmu_role cpu_role;

it's super confusing, e.g. I expected

union kvm_mmu_role mmu_role;

Nested EPT is a good example of complete confusion, because we compute kvm_mmu_role,
compare it to cpu_role, then shove it into both cpu_role and mmu_ole. It makes
sense once you reason about what it's doing, but on the surface it's confusing.

struct kvm_mmu *context = &vcpu->arch.guest_mmu;
u8 level = vmx_eptp_page_walk_level(new_eptp);
union kvm_mmu_role new_role =
kvm_calc_shadow_ept_root_page_role(vcpu, accessed_dirty,
execonly, level);

if (new_role.as_u64 != context->cpu_role.as_u64) {
/* EPT, and thus nested EPT, does not consume CR0, CR4, nor EFER. */
context->cpu_role.as_u64 = new_role.as_u64;
context->mmu_role.word = new_role.base.word;

Mabye this?

union kvm_mmu_vcpu_role vcpu_role;
union kvm_mmu_page_role mmu_role;

and some sample usage?

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d25f8cb2e62b..9f9b97c88738 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4836,13 +4836,16 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
{
struct kvm_mmu *context = &vcpu->arch.guest_mmu;
u8 level = vmx_eptp_page_walk_level(new_eptp);
- union kvm_mmu_role new_role =
+ union kvm_mmu_vcpu_role new_role =
kvm_calc_shadow_ept_root_page_role(vcpu, accessed_dirty,
execonly, level);

- if (new_role.as_u64 != context->cpu_role.as_u64) {
- /* EPT, and thus nested EPT, does not consume CR0, CR4, nor EFER. */
- context->cpu_role.as_u64 = new_role.as_u64;
+ if (new_role.as_u64 != context->vcpu_role.as_u64) {
+ /*
+ * EPT, and thus nested EPT, does not consume CR0, CR4, nor
+ * EFER, so the mmu_role is a strict subset of the vcpu_role.
+ */
+ context->vcpu_role.as_u64 = new_role.as_u64;
context->mmu_role.word = new_role.base.word;

context->page_fault = ept_page_fault;



And while I'm on a soapbox.... am I the only one that absolutely detests the use
of "context" and "g_context"? I'd be all in favor of renaming those to "mmu"
throughout the code as a prep to this series.

I also think we should move the initializing of guest_mmu => mmu into the MMU
helpers. Pulling the mmu from guest_mmu but then relying on the caller to wire
up guest_mmu => mmu so that e.g. kvm_mmu_new_pgd() works is gross and confused
the heck out of me. E.g.

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d25f8cb2e62b..4e7fe9758ce8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4794,7 +4794,7 @@ static void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu,
void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
unsigned long cr4, u64 efer, gpa_t nested_cr3)
{
- struct kvm_mmu *context = &vcpu->arch.guest_mmu;
+ struct kvm_mmu *mmu = &vcpu->arch.guest_mmu;
struct kvm_mmu_role_regs regs = {
.cr0 = cr0,
.cr4 = cr4 & ~X86_CR4_PKE,
@@ -4806,6 +4806,8 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
mmu_role = cpu_role.base;
mmu_role.level = kvm_mmu_get_tdp_level(vcpu);

+ vcpu->arch.mmu = &vcpu->arch.guest_mmu;
+
shadow_mmu_init_context(vcpu, context, cpu_role, mmu_role);
kvm_mmu_new_pgd(vcpu, nested_cr3);
}
@@ -4834,12 +4836,14 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
int huge_page_level, bool accessed_dirty,
gpa_t new_eptp)
{
- struct kvm_mmu *context = &vcpu->arch.guest_mmu;
+ struct kvm_mmu *mmu = &vcpu->arch.guest_mmu;
u8 level = vmx_eptp_page_walk_level(new_eptp);
union kvm_mmu_role new_role =
kvm_calc_shadow_ept_root_page_role(vcpu, accessed_dirty,
execonly, level);

+ vcpu->arch.mmu = mmu;
+
if (new_role.as_u64 != context->cpu_role.as_u64) {
/* EPT, and thus nested EPT, does not consume CR0, CR4, nor EFER. */
context->cpu_role.as_u64 = new_role.as_u64;
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 1218b5a342fc..d0f8eddb32be 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -98,8 +98,6 @@ static void nested_svm_init_mmu_context(struct kvm_vcpu *vcpu)

WARN_ON(mmu_is_nested(vcpu));

- vcpu->arch.mmu = &vcpu->arch.guest_mmu;
-
/*
* The NPT format depends on L1's CR4 and EFER, which is in vmcb01. Note,
* when called via KVM_SET_NESTED_STATE, that state may _not_ match current