Re: [RFC] KVM: x86: Allow userspace exit on HLT and MWAIT, else yield on MWAIT

From: David Woodhouse
Date: Tue Sep 26 2023 - 13:29:08 EST




On 26 September 2023 19:20:24 CEST, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>On Sat, Sep 23, 2023 at 6:44 PM Alexander Graf <graf@xxxxxxxxx> wrote:
>> On 23.09.23 11:24, Paolo Bonzini wrote:
>> > Why do you need it? You can just use KVM_RUN to go to sleep, and if you
>> > get another job you kick out the vCPU with pthread_kill. (I also didn't
>> > get the VSM reference).
>>
>> With the original VSM patches, we used to make a vCPU aware of the fact
>> that it can morph into one of many VTLs. That approach turned out to be
>> insanely intrusive and fragile and so we're currently reimplementing
>> everything as VTLs as vCPUs. That allows us to move the majority of VSM
>> functionality to user space. Everything we've seen so far looks as if
>> there is no real performance loss with that approach.
>
>Yes, that was also what I remember, sharing the FPU somehow while
>having separate vCPU file descriptors.
>
>> One small problem with that is that now user space is responsible for
>> switching between VTLs: It determines which VTL is currently running and
>> leaves all others (read: all other vCPUs) as stopped. That means if you
>> are running happily in KVM_RUN in VTL0 and VTL1 gets an interrupt, user
>> space needs to stop VTL0 and unpause VTL1 until it triggers VTL_RETURN
>> at which point VTL1 stops execution and VTL0 runs again.
>
>That's with IPIs in VTL1, right? I understand now. My idea was, since
>we need a link from VTL1 to VTL0 for the FPU, to use the same link to
>trigger a vmexit to userspace if source VTL > destination VTL. I am
>not sure how you would handle the case where the destination vCPU is
>not running; probably by detecting the IPI when VTL0 restarts on the
>destination vCPU?
>
>In any case, making vCPUs poll()-able is sensible.

Thinking about this a bit more, even for HLT it probably isn't just as simple as checking for mp_state changes. If there's a REQ_EVENT outstanding for something like a timer delivery, that won't get handled and the IRQ actually delivered to the local APIC until the vCPU is actually *run*, will it?