Re: [PATCH v2 07/16] arm/arm64: KVM: Add PSCI version selection API

From: Andrew Jones
Date: Wed Jan 31 2018 - 14:15:31 EST


On Wed, Jan 31, 2018 at 06:36:38PM +0000, Marc Zyngier wrote:
> On 31/01/18 18:03, Andrew Jones wrote:
> > On Wed, Jan 31, 2018 at 05:45:56PM +0000, Marc Zyngier wrote:
> >> On 31/01/18 17:38, Andrew Jones wrote:
> >>> On Mon, Jan 29, 2018 at 05:45:50PM +0000, Marc Zyngier wrote:
> >>>> Although we've implemented PSCI 1.0 and 1.1, nothing can select them
> >>>> Since all the new PSCI versions are backward compatible, we decide to
> >>>> default to the latest version of the PSCI implementation. This is no
> >>>> different from doing a firmware upgrade on KVM.
> >>>>
> >>>> But in order to give a chance to hypothetical badly implemented guests
> >>>> that would have a fit by discovering something other than PSCI 0.2,
> >>>> let's provide a new API that allows userspace to pick one particular
> >>>> version of the API.
> >>>>
> >>>> This is implemented as a new class of "firmware" registers, where
> >>>> we expose the PSCI version. This allows the PSCI version to be
> >>>> save/restored as part of a guest migration, and also set to
> >>>> any supported version if the guest requires it.
> >>>>
> >>>> Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx>
> >>>> ---
> >>>> Documentation/virtual/kvm/api.txt | 3 +-
> >>>> Documentation/virtual/kvm/arm/psci.txt | 28 ++++++++++++++
> >>>> arch/arm/include/asm/kvm_host.h | 3 ++
> >>>> arch/arm/include/uapi/asm/kvm.h | 6 +++
> >>>> arch/arm/kvm/guest.c | 13 +++++++
> >>>> arch/arm64/include/asm/kvm_host.h | 3 ++
> >>>> arch/arm64/include/uapi/asm/kvm.h | 6 +++
> >>>> arch/arm64/kvm/guest.c | 14 ++++++-
> >>>> include/kvm/arm_psci.h | 9 +++++
> >>>> virt/kvm/arm/psci.c | 68 +++++++++++++++++++++++++++++++++-
> >>>> 10 files changed, 149 insertions(+), 4 deletions(-)
> >>>> create mode 100644 Documentation/virtual/kvm/arm/psci.txt
> >>>>
> >>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >>>> index 57d3ee9e4bde..334905202141 100644
> >>>> --- a/Documentation/virtual/kvm/api.txt
> >>>> +++ b/Documentation/virtual/kvm/api.txt
> >>>> @@ -2493,7 +2493,8 @@ Possible features:
> >>>> and execute guest code when KVM_RUN is called.
> >>>> - KVM_ARM_VCPU_EL1_32BIT: Starts the CPU in a 32bit mode.
> >>>> Depends on KVM_CAP_ARM_EL1_32BIT (arm64 only).
> >>>> - - KVM_ARM_VCPU_PSCI_0_2: Emulate PSCI v0.2 for the CPU.
> >>>> + - KVM_ARM_VCPU_PSCI_0_2: Emulate PSCI v0.2 (or a future revision
> >>>> + backward compatible with v0.2) for the CPU.
> >>>> Depends on KVM_CAP_ARM_PSCI_0_2.
> >>>> - KVM_ARM_VCPU_PMU_V3: Emulate PMUv3 for the CPU.
> >>>> Depends on KVM_CAP_ARM_PMU_V3.
> >>>> diff --git a/Documentation/virtual/kvm/arm/psci.txt b/Documentation/virtual/kvm/arm/psci.txt
> >>>> new file mode 100644
> >>>> index 000000000000..2e49a4e9f084
> >>>> --- /dev/null
> >>>> +++ b/Documentation/virtual/kvm/arm/psci.txt
> >>>> @@ -0,0 +1,28 @@
> >>>> +KVM implements the PSCI (Power State Coordination Interface)
> >>>> +specification in order to provide services such as CPU on/off, reset
> >>>> +and power-off to the guest.
> >>>> +
> >>>> +The PSCI specification is regularly updated to provide new features,
> >>>> +and KVM implements these updates if they make sense from a virtualization
> >>>> +point of view.
> >>>> +
> >>>> +This means that a guest booted on two different versions of KVM can
> >>>> +observe two different "firmware" revisions. This could cause issues if
> >>>> +a given guest is tied to a particular PSCI revision (unlikely), or if
> >>>> +a migration causes a different PSCI version to be exposed out of the
> >>>> +blue to an unsuspecting guest.
> >>>> +
> >>>> +In order to remedy this situation, KVM exposes a set of "firmware
> >>>> +pseuodo-registers" that can be manipulated using the GET/SET_ONE_REG
> >>>> +interface. These registers can be saved/restored by userspace, and set
> >>>> +to a convenient value if required.
> >>>
> >>> Userspace can save/restore any state it deems necessary.
> >>
> >> Only if it has access to it. Until now, that wasn't an option.
> >>
> >>> Is there another
> >>> reason to invent a pseudo register? Considering the PSCI version is VM
> >>> state, then maybe a VM "device" attribute[*], like s390 use, would fit
> >>> better.
> >>
> >> Possibly. But that means current userspace won't engage the mitigation
> >> by default, and that's pretty bad.
> >>
> >>> Or, if keeping it VCPU state has some benefit, then we already
> >>> have VCPU device attribute support for ARM that we could extend with
> >>> another attribute. An advantage of using the device-attr API is that it
> >>> has KVM_HAS_DEVICE_ATTR, allowing the new attribute support to be probed.
> >>> How should userspace check if the pseudo register is supported? Maybe
> >>> by just failing with NOT_SUPPORTED to get/set it?
> >>>
> >>> [*] Documentation/virtual/kvm/devices/vm.txt
> >>
> >> Frankly, I have no opinion on the way to implement it. My only
> >> requirement is that it becomes enabled by default, without any userspace
> >> change.
> >
> > I think we can turn it on by default in KVM, no matter what API we choose,
> > and then enable userspace (with whatever API) to turn it off. Newer
> > machine types certainly wouldn't do that, and, as later PSCI is compatible
> > with older PSCI, we're probably safe (safer regarding the BTB) not using
> > the older PSCI with older machine types either - although that would be up
> > for debate and the whole point of adding the API.
>
> OK. At least we're aligned on this. As for your question about finding
> out about the register, here's what I have in kvmtool:

Actually I was starting to have some second thoughts about that default,
but...

>
> + reg.addr = (u64)&data;
> + reg.id = KVM_REG_ARM_PSCI_VERSION;
> + err = ioctl(vcpu->vcpu_fd, KVM_GET_ONE_REG, &reg);
> + if (err < 0)
> + pr_info("Unable to get PSCI version\n");
> + else
> + pr_info("PSCI version %llx\n", data);
>
> If you get an error, the feature doesn't exist. One of the reasons for
> using a register is that migration works with today's QEMU. And if you
> try to restore a VM that's aware of the PSCI version (and possibly of
> the BTB invalidation) on a host that doesn't support it, you get a fail
> without any change of userspace.
>
> To me, that's a pretty good argument in favour of this method. But if

...I agree that does sound like a good argument.

> you have strong feelings about it and would like me to implement
> something else, please let me know ASAP. We'd like this series to land
> in -rc1, so we only have a few days left...

I'll do some experiments tomorrow and let you know if my second thoughts
are valid concerns right away.

>
> >>
> >>>
> >>>> +
> >>>> +The following register is defined:
> >>>> +
> >>>> +* KVM_REG_ARM_PSCI_VERSION:
> >>>> +
> >>>> + - Only valid if the vcpu has KVM_ARM_VCPU_PSCI_0_2 feature set
> >>>> + - Returns the current PSCI version on GET_ONE_REG
> >>>> + - Allows any supported PSCI version compatible with v0.2 to be set
> >>>> + with SET_ONE_REG
> >>>> + - Affects the whole VM (even if the register view is per-vcpu)
> >>>
> >>>
> >>>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> >>>> index acbf9ec7b396..e9d57060d88c 100644
> >>>> --- a/arch/arm/include/asm/kvm_host.h
> >>>> +++ b/arch/arm/include/asm/kvm_host.h
> >>>> @@ -75,6 +75,9 @@ struct kvm_arch {
> >>>> /* Interrupt controller */
> >>>> struct vgic_dist vgic;
> >>>> int max_vcpus;
> >>>> +
> >>>> + /* Mandated version of PSCI */
> >>>> + u32 psci_version;
> >>>> };
> >>>>
> >>>> #define KVM_NR_MEM_OBJS 40
> >>>> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> >>>> index 6edd177bb1c7..47dfc99f5cd0 100644
> >>>> --- a/arch/arm/include/uapi/asm/kvm.h
> >>>> +++ b/arch/arm/include/uapi/asm/kvm.h
> >>>> @@ -186,6 +186,12 @@ struct kvm_arch_memory_slot {
> >>>> #define KVM_REG_ARM_VFP_FPINST 0x1009
> >>>> #define KVM_REG_ARM_VFP_FPINST2 0x100A
> >>>>
> >>>> +/* KVM-as-firmware specific pseudo-registers */
> >>>> +#define KVM_REG_ARM_FW (0x0014 << KVM_REG_ARM_COPROC_SHIFT)
> >>>
> >>> Is this 0x14 documented somewhere? I'm just curious how the space for
> >>> pseudo registers is reserved. Is there a common place that we should
> >>> document that 0x14 is for the firmware (if it's not documented there
> >>> already)?
> >>
> >> No. First come, first served.
> >
> > Should we document it in Documentation/ somewhere?
> We can. I tend to see the uapi/asm/kvm.h file as the reference (and I
> feel that a text file would simply be an English translation of that
> file), but I'm not opposed to it either. I'm not sure I have the
> bandwidth for that at the moment though.

Yeah, docs can come later. But now that I look, I think we just need a
couple lines in Documentation/virtual/kvm/api.txt like

ARM 32-bit firmware registers have the following id bit patterns:
0x4030 0000 0014 0 <regno:16>

ARM 64-bit firmware registers have the following id bit patterns:
0x6030 0000 0014 0 <regno:16>

(I'm not sure those are correct. I'd expect both to start with 4 like
the other entries in that file, but maybe not, or the other 64-bit
entries in that file aren't correct?)

Thanks,
drew