Re: [RFC 0/3] KVM: x86: introduce pv feature lazy tscdeadline

From: Wang Jianchao
Date: Thu Jul 13 2023 - 21:30:17 EST




On 2023.07.13 21:32, Xiaoyao Li wrote:
> On 7/13/2023 10:50 AM, Wang Jianchao wrote:
>>
>>
>> On 2023.07.13 02:14, Zhi Wang wrote:
>>> On Fri,  7 Jul 2023 14:17:58 +0800
>>> Wang Jianchao <jianchwa@xxxxxxxxxxx> wrote:
>>>
>>>> Hi
>>>>
>>>> This patchset attemps to introduce a new pv feature, lazy tscdeadline.
>>>> Everytime guest write msr of MSR_IA32_TSC_DEADLINE, a vm-exit occurs
>>>> and host side handle it. However, a lot of the vm-exit is unnecessary
>>>> because the timer is often over-written before it expires.
>>>>
>>>> v : write to msr of tsc deadline
>>>> | : timer armed by tsc deadline
>>>>
>>>>           v v v v v        | | | | |
>>>> --------------------------------------->  Time
>>>>
>>>> The timer armed by msr write is over-written before expires and the
>>>> vm-exit caused by it are wasted. The lazy tscdeadline works as following,
>>>>
>>>>           v v v v v        |       |
>>>> --------------------------------------->  Time
>>>>                            '- arm -'
>>>>
>>>
>>> Interesting patch.
>>>
>>> I am a little bit confused of the chart above. It seems the write of MSR,
>>> which is said to cause VM exit, is not reduced in the chart of lazy
>>> tscdeadline, only the times of arm are getting less. And the benefit of
>>> lazy tscdeadline is said coming from "less vm exit". Maybe it is better
>>> to imporve the chart a little bit to help people jump into the idea
>>> easily?
>>
>> Thanks so much for you comment and sorry for my poor chart.
>>
>> Let me try to rework the chart.
>>
>> Before this patch, every time guest start or modify a hrtimer, we need to write the msr of tsc deadline,
>> a vm-exit occurs and host arms a hv or sw timer for it.
>>
>>
>> w: write msr
>> x: vm-exit
>> t: hv or sw timer
>>
>>
>> Guest
>>           w
>> --------------------------------------->  Time
>> Host     x              t
>>  
>> However, in some workload that needs setup timer frequently, msr of tscdeadline is usually overwritten
>> many times before the timer expires. And every time we modify the tscdeadline, a vm-exit ocurrs
>>
>>
>> 1. write to msr with t0
>>
>> Guest
>>           w0
>> ---------------------------------------->  Time
>> Host     x0             t0
>>
>>   2. write to msr with t1
>> Guest
>>               w1
>> ------------------------------------------>  Time
>> Host         x1          t0->t1
>>
>>
>> 2. write to msr with t2
>> Guest
>>                  w2
>> ------------------------------------------>  Time
>> Host            x2          t1->t2
>>  
>> 3. write to msr with t3
>> Guest
>>                      w3
>> ------------------------------------------>  Time
>> Host                x3           t2->t3
>>
>>
>>
>> What this patch want to do is to eliminate the vm-exit of x1 x2 and x3 as following,
>>
>>
>> Firstly, we have two fields shared between guest and host as other pv features, saying,
>>   - armed, the value of tscdeadline that has a timer in host side, only updated by __host__ side
>>   - pending, the next value of tscdeadline, only updated by __guest__ side
>>
>>
>> 1. write to msr with t0
>>
>>               armed   : t0
>>               pending : t0
>> Guest
>>           w0
>> ---------------------------------------->  Time
>> Host     x0             t0
>>
>> vm-exit occurs and arms a timer for t0 in host side
>
> What's the initial value of @armed and @pending?

Both of them are zero.

@armed is only updated by host
@pending is updated by guest

Guest side will check @armed, it it is zero, jumps to wrmsrl

>
>>   2. write to msr with t1
>>
>>               armed   : t0
>>               pending : t1
>>
>> Guest
>>               w1
>> ------------------------------------------>  Time
>> Host                     t0
>>
>> the value of tsc deadline that has been armed, namely t0, is smaller than t1, needn't to write
>> to msr but just update pending
>
> if t1 < t0, then it triggers the vm exit, right?

Yes. If new tsc deadline value is smaller than @armed, namely t1 here, it jumps to wrmsrl

> And in this case, I think @armed will be updated to t1. What about pending? will it get updated to t1 or not?

Yes, the guest jumps to wrmsrl and causes a vm-exit, the host side will update the @armed and re-arm the timer


Thanks
Jianchao

>
>>
>> 3. write to msr with t2
>>
>>               armed   : t0
>>               pending : t2
>>   Guest
>>                  w2
>> ------------------------------------------>  Time
>> Host                      t0
>>   Similar with step 2, just update pending field with t2, no vm-exit
>>
>>
>> 4.  write to msr with t3
>>
>>               armed   : t0
>>               pending : t3
>>
>> Guest
>>                      w3
>> ------------------------------------------>  Time
>> Host                       t0
>> Similar with step 2, just update pending field with t3, no vm-exit
>>
>>
>> 5.  t0 expires, arm t3
>>
>>               armed   : t3
>>               pending : t3
>>
>>
>> Guest
>>                              ------------------------------------------>  Time
>> Host                       t0  ------> t3
>>
>> t0 is fired, it checks the pending field and re-arm a timer based on it.
>>
>>
>> Here is the core ideal of this patch ;)
>>
>>
>> Thanks
>> Jianchao
>>
>>>
>>>> The 1st timer is responsible for arming the next timer. When the armed
>>>> timer is expired, it will check pending and arm a new timer.
>>>>
>>>> In the netperf test with TCP_RR on loopback, this lazy_tscdeadline can
>>>> reduce vm-exit obviously.
>>>>
>>>>                           Close               Open
>>>> --------------------------------------------------------
>>>> VM-Exit
>>>>               sum         12617503            5815737
>>>>              intr      0% 37023            0% 33002
>>>>             cpuid      0% 1                0% 0
>>>>              halt     19% 2503932         47% 2780683
>>>>         msr-write     79% 10046340        51% 2966824
>>>>             pause      0% 90               0% 84
>>>>     ept-violation      0% 584              0% 336
>>>>     ept-misconfig      0% 0                0% 2
>>>> preemption-timer      0% 29518            0% 34800
>>>> -------------------------------------------------------
>>>> MSR-Write
>>>>              sum          10046455            2966864
>>>>          apic-icr     25% 2533498         93% 2781235
>>>>      tsc-deadline     74% 7512945          6% 185629
>>>>
>>>> This patchset is made and tested on 6.4.0, includes 3 patches,
>>>>
>>>> The 1st one adds necessary data structures for this feature
>>>> The 2nd one adds the specific msr operations between guest and host
>>>> The 3rd one are the one make this feature works.
>>>>
>>>> Any comment is welcome.
>>>>
>>>> Thanks
>>>> Jianchao
>>>>
>>>> Wang Jianchao (3)
>>>>     KVM: x86: add msr register and data structure for lazy tscdeadline
>>>>     KVM: x86: exchange info about lazy_tscdeadline with msr
>>>>     KVM: X86: add lazy tscdeadline support to reduce vm-exit of msr-write
>>>>
>>>>
>>>>   arch/x86/include/asm/kvm_host.h      |  10 ++++++++
>>>>   arch/x86/include/uapi/asm/kvm_para.h |   9 +++++++
>>>>   arch/x86/kernel/apic/apic.c          |  47 ++++++++++++++++++++++++++++++++++-
>>>>   arch/x86/kernel/kvm.c                |  13 ++++++++++
>>>>   arch/x86/kvm/cpuid.c                 |   1 +
>>>>   arch/x86/kvm/lapic.c                 | 128 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
>>>>   arch/x86/kvm/lapic.h                 |   4 +++
>>>>   arch/x86/kvm/x86.c                   |  26 ++++++++++++++++++++
>>>>   8 files changed, 229 insertions(+), 9 deletions(-)
>>>
>