Re: [PATCH 5/6] KVM: MMU: prefetch ptes when intercepted guest #PF

From: Xiao Guangrong
Date: Thu Jun 17 2010 - 05:08:33 EST




Avi Kivity wrote:
> On 06/17/2010 10:25 AM, Xiao Guangrong wrote:
>>
>>
>>> Can this in fact work for level != PT_PAGE_TABLE_LEVEL? We might start
>>> at PT_PAGE_DIRECTORY_LEVEL but get 4k pages while iterating.
>>>
>> Ah, i forgot it. We can't assume that the host also support huge page for
>> next gfn, as Marcelo's suggestion, we should "only map with level> 1 if
>> the host page matches the size".
>>
>> Um, the problem is, when we get host page size, we should hold
>> 'mm->mmap_sem',
>> it can't used in atomic context and it's also a slow path, we hope pte
>> prefetch
>> path is fast.
>>
>> How about only allow prefetch for sp.leve = 1 now? i'll improve it in
>> the future,
>> i think it need more time :-)
>>
>
> I don't think prefetch for level > 1 is worthwhile. One fault per 2MB
> is already very good, no need to optimize it further.
>

OK

>>>> +
>>>> + pfn = gfn_to_pfn_atomic(vcpu->kvm, gfn);
>>>> + if (is_error_pfn(pfn)) {
>>>> + kvm_release_pfn_clean(pfn);
>>>> + break;
>>>> + }
>>>> + if (pte_prefetch_topup_memory_cache(vcpu))
>>>> + break;
>>>> +
>>>> + mmu_set_spte(vcpu, spte, ACC_ALL, ACC_ALL, 0, 0, 1, NULL,
>>>> + sp->role.level, gfn, pfn, true, false);
>>>> + }
>>>> +}
>>>>
>>>>
>>> Nice. Direct prefetch should usually succeed.
>>>
>>> Can later augment to call get_users_pages_fast(..., PTE_PREFETCH_NUM,
>>> ...) to reduce gup overhead.
>>>
>> But we can't assume the gfn's hva is consecutive, for example, gfn and
>> gfn+1
>> maybe in the different slots.
>>
>
> Right. We could limit it to one slot then for simplicity.

OK, i'll do it.

>
>>
>>>> +
>>>> + if (!table) {
>>>> + page = gfn_to_page_atomic(vcpu->kvm, sp->gfn);
>>>> + if (is_error_page(page)) {
>>>> + kvm_release_page_clean(page);
>>>> + break;
>>>> + }
>>>> + table = kmap_atomic(page, KM_USER0);
>>>> + table = (pt_element_t *)((char *)table + offset);
>>>> + }
>>>>
>>>>
>>> Why not kvm_read_guest_atomic()? Can do it outside the loop.
>>>
>> Do you mean that read all prefetched sptes at one time?
>>
>
> Yes.
>
>> If prefetch one spte fail, the later sptes that we read is waste, so i
>> choose read next spte only when current spte is prefetched successful.
>>
>> But i not have strong opinion on it since it's fast to read all sptes at
>> one time, at the worst case, only 16 * 8 = 128 bytes we need to read.
>>
>
> In general batching is worthwhile, the cost of the extra bytes is low
> compared to the cost of bringing in the cacheline and error checking.
>

Agreed.

> btw, you could align the prefetch to 16 pte boundary. That would
> improve performance for memory that is scanned backwards.
>

Yeah, good idea.

> So we can change the fault path to always fault 16 ptes, aligned on 16
> pte boundary, with the needed pte called with specualtive=false.

Avi, i not understand it clearly, Could you please explain it? :-(
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/