Re: [RFC PATCH 1/2] KVM: MMU: precompute page fault error code

From: Paolo Bonzini
Date: Thu Mar 10 2016 - 09:07:37 EST




On 10/03/2016 15:01, Xiao Guangrong wrote:
>
>
> On 03/08/2016 07:45 PM, Paolo Bonzini wrote:
>> For the next patch, we will want to filter PFERR_FETCH_MASK away early,
>> and not pass it to permission_fault if neither NX nor SMEP are enabled.
>> Prepare for the change.
>
> Why it is needed? It is much easier to drop PFEC.F in
> update_permission_bitmask().

It's just how I structured the patches. It's unrelated to
update_permission_bimask.

I wanted permission_fault to return a fault code, and while it's easy to
add bits (such as PKERR_PK_MASK) it's harder to remove bits from the
PFEC that permission_fault receives. So I'm doing it here.

Feel free to instead drop PFEC.F in permission_fault() or even add a new
member to kvm_mmu with the valid bits of a PFEC. This is just an RFC
for you and Huaitong to adapt in the PK patchset. Sometimes things are
easier to describe in patches than in English. :)

Paolo

>>
>> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
>> ---
>> arch/x86/kvm/mmu.c | 2 +-
>> arch/x86/kvm/paging_tmpl.h | 26 +++++++++++++++-----------
>> 2 files changed, 16 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 2463de0b935c..e57f7be061e3 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -3883,7 +3883,7 @@ static void update_permission_bitmask(struct
>> kvm_vcpu *vcpu,
>> u = bit & ACC_USER_MASK;
>>
>> if (!ept) {
>> - /* Not really needed: !nx will cause pte.nx to fault */
>> + /* Not really needed: if !nx, ff will always be zero */
>> x |= !mmu->nx;
>> /* Allow supervisor writes if !cr0.wp */
>> w |= !is_write_protection(vcpu) && !uf;
>> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
>> index 6013f3685ef4..285858d3223b 100644
>> --- a/arch/x86/kvm/paging_tmpl.h
>> +++ b/arch/x86/kvm/paging_tmpl.h
>> @@ -272,13 +272,24 @@ static int FNAME(walk_addr_generic)(struct
>> guest_walker *walker,
>> gpa_t pte_gpa;
>> int offset;
>> const int write_fault = access & PFERR_WRITE_MASK;
>> - const int user_fault = access & PFERR_USER_MASK;
>> - const int fetch_fault = access & PFERR_FETCH_MASK;
>> - u16 errcode = 0;
>> + u16 errcode;
>> gpa_t real_gpa;
>> gfn_t gfn;
>>
>> trace_kvm_mmu_pagetable_walk(addr, access);
>> +
>> + /*
>> + * Do not modify PFERR_FETCH_MASK in access. It is used later in
>> the call to
>> + * mmu->translate_gpa and, when nested virtualization is in use,
>> the X or NX
>> + * bit of nested page tables always applies---even if NX and SMEP
>> are disabled
>> + * in the guest.
>> + *
>> + * TODO: cache the result of the NX and SMEP test in struct kvm_mmu?
>> + */
>> + errcode = access;
>> + if (!(mmu->nx || kvm_read_cr4_bits(vcpu, X86_CR4_SMEP)))
>> + errcode &= ~PFERR_FETCH_MASK;
>> +
>
> PFEC.P might be set since it is copied from @access.
>
> And the workload is moved from rare path to the common path...
>
>> retry_walk:
>> walker->level = mmu->root_level;
>> pte = mmu->get_cr3(vcpu);
>> @@ -389,9 +400,7 @@ retry_walk:
>>
>> if (unlikely(!accessed_dirty)) {
>> ret = FNAME(update_accessed_dirty_bits)(vcpu, mmu, walker,
>> write_fault);
>> - if (unlikely(ret < 0))
>> - goto error;
>> - else if (ret)
>> + if (ret > 0)
>> goto retry_walk;
>
> So it returns normally even if update_accessed_dirty_bits() failed.