Re: [PART1 RFC v2 05/10] KVM: x86: Detect and Initialize AVIC support

From: Paolo Bonzini
Date: Tue Mar 15 2016 - 13:22:56 EST




On 15/03/2016 18:09, Suravee Suthikulpanit wrote:
> Hi
>
> On 03/07/2016 11:41 PM, Paolo Bonzini wrote:
>> On 04/03/2016 21:46, Suravee Suthikulpanit wrote:
>> > [....]
>>> +/* Note: This structure is per VM */
>>> +struct svm_vm_data {
>>> + atomic_t count;
>>> + u32 ldr_mode;
>>> + u32 avic_max_vcpu_id;
>>> + u32 avic_tag;
>>> +
>>> + struct page *avic_log_ait_page;
>>> + struct page *avic_phy_ait_page;
>>
>> You can put these directly in kvm_arch. Do not use abbreviations:
>>
>> struct page *avic_logical_apic_id_table_page;
>> struct page *avic_physical_apic_id_table_page;
>>
>
> Actually, the reason I would like to introduce this per-arch specific
> structure is because I feel that it is easier to manage these
> processor-specific variable/data-structure. If we add all these directly
> into kvm_arch, which is shared b/w SVM and VMX, it is more difficult to
> tell which one is used in the different code base.

You're right, but adding a pointer makes things slower and larger.
Using an anonymous union would work. For now, I prefer to have the
fields directly in kvm_arch.

>>> [...]
>>> +static struct svm_avic_phy_ait_entry *
>>> +avic_get_phy_ait_entry(struct kvm_vcpu *vcpu, int index)
>>> +{
>>> + [.....]
>>> +}
>>> +
>>> +struct svm_avic_log_ait_entry *
>>> +avic_get_log_ait_entry(struct kvm_vcpu *vcpu, u8 mda, bool is_flat)
>>> +{
>>> + [.....]
>>> +}
>>
>> Instead of these functions, create a complete function to handle APIC_ID
>> and APIC_LDR writes. Then use kmap/kunmap instead of page_address.
>>
>
> Ok. May I ask why we are against using page_address? I have see that
> used in several places in the code.

You're right, I guess page_address is okay for pages that were allocated
with alloc_page().

>> Why is this necessary? The APIC access page is a peculiarity of Intel
>> processors (and the special memslot for only needs to map 0xfee00000 to
>> 0xfee00fff; after that there is the MSI area).
>
> The current lapic regs page is allocated using get_zeroed_page(), which
> can be paged out. If I use these pages for AVIC backing pages, it seems
> to cause VM to slow down quite a bit due to a lot of page faults.

What causes the lapic regs page to be paged out?

> Currently, the AVIC backing pages are acquired from __x86_set_memory
> region() with APIC_ACCESS_PAGE_PRIVATE_MEMSLOT, which maps the pages for
> address 0xfee00000 and above for VM to use. I mostly grab this from the
> VMX implementation in alloc_apic_access_page().
>
> However, the memslot requires specification of the size at the time when
> calling __x86_set_memory_region(). However, I can't seem to figure out
> where I can get the number of vcpus at the time when we creating VM.
> Therefore, I have to track the vcpu creation, and re-acquire larger
> memslot every time vcpu_create() is called.

The purpose of the APIC_ACCESS_PAGE_PRIVATE_MEMSLOT is very specific:
it is there to provide a mapping for 0xfee00000 because Intel processors
trap writes between 0xfee00000 and 0xfee00fff, but otherwise ignore the
contents of the page you map there. Intel processors only need
something to compare the physical address with, they don't care about
the data in the page, so they have one page per VM (they could really
use a single page in the whole system---one per VM is just how KVM works
right now). It is a peculiar design, and one that you should probably
ignore in your AVIC patches.

The AVIC backing page is more similar to Intel's "virtual-APIC page".
You can see that vmx.c just uses lapic->regs for it.

if (cpu_has_vmx_tpr_shadow() && !init_event) {
vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, 0);
if (cpu_need_tpr_shadow(vcpu))
vmcs_write64(VIRTUAL_APIC_PAGE_ADDR,
__pa(vcpu->arch.apic->regs));
vmcs_write32(TPR_THRESHOLD, 0);
}

Paolo