Re: [PATCH] x86: kvm: reset the bootstrap processor when it getsan INIT

From: Paolo Bonzini
Date: Mon Mar 11 2013 - 13:40:02 EST


Il 11/03/2013 18:20, Gleb Natapov ha scritto:
> On Mon, Mar 11, 2013 at 03:28:03PM +0100, Paolo Bonzini wrote:
>> Il 11/03/2013 14:54, Gleb Natapov ha scritto:
>>>> Setting the mp_state to INIT_RECEIVED is that interface, and it already
>>>> works, for APs at least. This patch extends it to work for the BSP as well.
>>>
>>> It does not for AP either. If AP has vmx on mp_sate should not be set to
>>> INIT_RECEIVED. mp_sate is a state as you can see from its name and we
>>> already had a discussion on the generic device API about importance of
>>> separating sending commands from setting state. There is a difference
>>> between setting mp_state during migration and signaling INIT#.
>>
>> What does migration have to do with this?
>
> get|set_mpstate is used by migration. Actually this is primary reason
> for this interface existence.

Does it have to be the only one?

>>>> In the corresponding userspace patch, I don't need to touch the CPU
>>>> state at all. I can just signal the kernel. If I touch the CPU, I'll
>>>> break the nested case, no matter how it is implemented. So far, the
>>>> userspace did not have to worry about nested, and that's something that
>>>> should be kept that way.
>>> We are discussing two different things here. I'll try to separate them.
>>> 1. BSP is broken WRT #INIT
>>> 2. nested is broken WRT #INIT
>>>
>>> You are fixing 1 with your patches, for that I proposed much easier
>>> solution (at last from kernel point of view): if BSP reset it in
>>> userspace and make it runnable. Nested virt is still broken, but this is
>>> not what you are fixing.
>>
>> It's not what I'm fixing, but I don't want to make the fix for nested
>
> What are you fixing then?

Nested virt is not what I am fixing, but I'm trying to keep an eye on
that (and the other INIT race) while doing these patches.

>> virt unnecessarily more complicated. Nested virt needs to know about
>> INIT and SIPI; redefining the meaning of INIT_RECEIVED and SIPI_RECEIVED
>> makes it more complicated to reflect these events to L1.
>>
>>> For 2 much more involved fix is needed. Jan fixes it and it will require
>>> signaling INIT# from userspace by other means than mp_sate because
>>> signaling INIT# does not automatically means that mp_sate changes to
>>> INIT_RECEIVED.
>>
>> In your interpretation of INIT_RECEIVED, no. In mine, yes...
>
> Your code shows different. With your patch setting mp_state to
> INIT_RECEIVED makes vcpu non tunable. This is incorrect if INIT_RECEIVED
> is "INIT# is triggered" interface.

What do you mean by "non tunable"? In non-nested mode, the VCPU will
reset immediately, as soon as it is re-entered. In nested mode, the
VCPU will eat the INIT_RECEIVED and turn it into a vmexit.

At least according to AMD's docs, the VMM has to reassert INIT if it
wants the processor to actually process it [15.20.8 INIT support].
Intel's does not say it explicitly, but it doesn't say the opposite
either. It seems to be the only that makes sense.

>>>> If we move away from the INIT_RECEIVED and SIPI_RECEIVED states for
>>>> in-kernel APIC -> VCPU communication, then the KVM_SET_MP_STATE ioctl
>>>> will have to convert them to the right bits in the requests field or in
>>>> the APIC state. But I'm starting to see less benefit from moving away
>>>> from mp_state.
>>>>
>>> We are not moving away from mp_state, we are moving away from using
>>> mp_state for signaling
>>
>> That's what I meant; sorry for the unclear abbreviation.
>
> Then we disagree.

We do. Let's see _where_ exactly we disagree.

>>> because with nested virt INIT does not always
>>> change mp_state
>>
>> Why not?
>
> Because mp_state is the current state the vcpu is in. It can be
> uninitialized, runnable, halted or wait for sipi. SDM says that
> if nested virt is enabled vcpu does not enter wait for sipi state
> on INIT#.

Yes, but it still has to do something (a vmexit) and go back to RUNNING.
So it needs signaling from userspace to the kernel.

>> Which is why it's good to have the reset done in kernel space,
>> not in user space.
>
> Without nested virt it does not really matter and if it is does not
> really matter you do not add code to the kernel just because it is good.
> With nested virt INIT# processing needs to go to the kernel. In some
> cases INIT will cause reset, but you do not "do reset in kernel space",
> you do "INIT# handling in kernel space".

We agree on this. What I add is: let's define the API so that it is
nested-friendly. This means having a signaling mechanism for userspace.
I think you do not want mp_state to be this signaling mechanism. Why
not? Can an existing ioctl be the alternative or do we need to invent a
new one?

Paolo

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/