Re: [PATCH v2 20/20] powerpc/mm: Add speculative page fault

From: Laurent Dufour
Date: Tue Aug 29 2017 - 11:14:52 EST


On 21/08/2017 08:58, Anshuman Khandual wrote:
> On 08/18/2017 03:35 AM, Laurent Dufour wrote:
>> This patch enable the speculative page fault on the PowerPC
>> architecture.
>>
>> This will try a speculative page fault without holding the mmap_sem,
>> if it returns with WM_FAULT_RETRY, the mmap_sem is acquired and the
>
> s/WM_FAULT_RETRY/VM_FAULT_RETRY/

Good catch ;)

>> traditional page fault processing is done.
>>
>> Support is only provide for BOOK3S_64 currently because:
>> - require CONFIG_PPC_STD_MMU because checks done in
>> set_access_flags_filter()
>
> What checks are done in set_access_flags_filter() ? We are just
> adding the code block in do_page_fault().

set_access_flags_filter() is checking for vm_flags & VM_EXEC which may be
changed in our back, leading to a spurious WARN displayed.
This being said, I focused on the BOOK3S as this meaningful for large
system, and I didn't get time to check for embedded systems.

>
>> - require BOOK3S because we can't support for book3e_hugetlb_preload()
>> called by update_mmu_cache()
>>
>> Signed-off-by: Laurent Dufour <ldufour@xxxxxxxxxxxxxxxxxx>
>> ---
>> arch/powerpc/include/asm/book3s/64/pgtable.h | 5 +++++
>> arch/powerpc/mm/fault.c | 30 +++++++++++++++++++++++++++-
>> 2 files changed, 34 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> index 818a58fc3f4f..897f8b9f67e6 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -313,6 +313,11 @@ extern unsigned long pci_io_base;
>> /* Advertise support for _PAGE_SPECIAL */
>> #define __HAVE_ARCH_PTE_SPECIAL
>>
>> +/* Advertise that we call the Speculative Page Fault handler */
>> +#if defined(CONFIG_PPC_BOOK3S_64)
>> +#define __HAVE_ARCH_CALL_SPF
>> +#endif
>> +
>> #ifndef __ASSEMBLY__
>>
>> /*
>> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
>> index 4c422632047b..7b3cc4c30eab 100644
>> --- a/arch/powerpc/mm/fault.c
>> +++ b/arch/powerpc/mm/fault.c
>> @@ -291,9 +291,36 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
>> if (is_write && is_user)
>> store_update_sp = store_updates_sp(regs);
>>
>> - if (is_user)
>> + if (is_user) {
>> flags |= FAULT_FLAG_USER;
>>
>> +#if defined(__HAVE_ARCH_CALL_SPF)
>> + /* let's try a speculative page fault without grabbing the
>> + * mmap_sem.
>> + */
>> +
>> + /*
>> + * flags is set later based on the VMA's flags, for the common
>> + * speculative service, we need some flags to be set.
>> + */
>> + if (is_write)
>> + flags |= FAULT_FLAG_WRITE;
>> +
>> + fault = handle_speculative_fault(mm, address, flags);
>> + if (!(fault & VM_FAULT_RETRY || fault & VM_FAULT_ERROR)) {
>> + perf_sw_event(PERF_COUNT_SW_SPF_DONE, 1,
>> + regs, address);
>> + goto done;
>
> Why we should retry with classical page fault on VM_FAULT_ERROR ?
> We should always return VM_FAULT_RETRY in case there is a clear
> collision some where which requires retry with classical method
> and return VM_FAULT_ERROR in cases where we know that it cannot
> be retried and fail for good. Should not handle_speculative_fault()
> be changed to accommodate this ?

There is no need to change handle_speculative_fault(), it should return
VM_FAULT_RETRY when a retry is required. If VM_FAULT_ERROR is return, we
should be able to jump to the block dealing with VM_FAULT_ERROR and calling
vm_fault_error().


>
>> + }
>> +
>> + /*
>> + * Resetting flags since the following code assumes
>> + * FAULT_FLAG_WRITE is not set.
>> + */
>> + flags &= ~FAULT_FLAG_WRITE;
>> +#endif /* defined(__HAVE_ARCH_CALL_SPF) */
>
> Setting and resetting of FAULT_FLAG_WRITE seems confusing. Why you
> say that some flags need to be set for handle_speculative_fault()
> function. Could you elaborate on this ?

FAULT_FLAG_WRITE is required to handle write access. In the case we retry
with the classical path, the flag is reset and will be set later if
!is_exec and is_write.