Re: [RFC PATCH 1/2] kvm/arm: add capability to forward hypercall to user space

From: Guoheyi
Date: Thu Oct 31 2019 - 09:07:39 EST

Next message: Krzysztof Kozlowski: "Re: [PATCH v2 11/11] ARM: dts: imx6ul-kontron-n6310-s-43: Add missing includes for GPIOs and IRQs"
Previous message: Konstantin Khlebnikov: "[PATCH] x86/MCE/AMD: fix warning about sleep-in-atomic at early boot"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Sorry for the late, for it took some time for me to think it over...

On 2019/10/22 0:42, James Morse wrote:

Hi Heyi,

On 09/10/2019 13:33, Guoheyi wrote:

On 2019/10/2 1:19, James Morse wrote:

On 24/09/2019 16:20, Heyi Guo wrote:

As more SMC/HVC usages emerge on arm64 platforms, like SDEI, it makes
sense for kvm to have the capability of forwarding such calls to user
space for further emulation.

(what do you mean by further? Doesn't user-space have to do all of it?)

For kvm will always handle hvc/smc guest exit for the first step, even if it is only a
simple forwarding, I called the user-space processing as "further emulation".

We reuse the existing term "hypercall" for SMC/HVC, as well as the
hypercall structure in kvm_run to exchange arguments and return
values. The definition on arm64 is as below:

exit_reason: KVM_EXIT_HYPERCALL

Input:
nr: the immediate value of SMC/HVC calls; not really used today.
args[6]: x0..x5 (This is not fully conform with SMCCC which requires
x6 as argument as well, but use space can use GET_ONE_REG
ioctl for such rare case).

If this structure isn't right for us, we could define a different one for arm/arm64.
(we did this for kvm_vcpu_events)

Do you mean that we can move the hypercall struct definition to arch specific kvm_host.h?
For it is in the common kvm_run structure, we'll need to change every kvm supported
architectures, including x86, mips, powerpc, s390. Is it acceptable?

Ah! Sorry, I'd missed this was in the kvm_run structure. The get-events example doesn't
apply here as that was a separate ioctl().

Return:
args[0..3]: x0..x3 as defined in SMCCC. We need to extract
args[0..3] and write them to x0..x3 when hypercall exit
returns.

Are we saying that KVM_EXIT_HYPERCALL expects to be used with SMC-CC?
(if so, we should state that).

Yes I followed SMC-CC when writing this.

I'm not certain we should tie this to SMC-CC.

If we don't tie it to SMC-CC this selection of in/out registers looks odd, there is
nothing about HVC/SMC that uses these registers, its just the SMC convention.

Maybe we don't need to tie it to SMC-CC, and simply load all values in args[6] to GP
registers...
And then there is either no strong reason to extend hypercall structure for ARM.

Flag hypercall_forward is added to turn on/off hypercall forwarding
and the default is false. Another flag hypercall_excl_psci is to
exclude PSCI from forwarding for backward compatible, and it only
makes sense to check its value when hypercall_forward is enabled.

Calling out PSCI like this is something we shouldn't do. There will be, (are!) other
SMC-CC calls that the kernel provides emulation for, we can't easily add to this list.

Yes; I didn't figure out good way to keep compatibility and future extension...

I think the best trick is not to interpret the SMC/HVC calls from the guest. The kernel
obviously does, but the API shouldn't force us to.

I think the best way to avoid this, is to say the hypercall mechanism forwards 'unhandled
SMC/HVC' to user-space. Which things the kernel chooses to handle can change.

We need a way for user-space to know which SMC/HVC calls the kernel will handle, and will
not forward. A suggestion is to add a co-processor that lists these by #imm and r0/x0
value. User-space can then query any call to find out if it would be exported if the guest
made that call. Something like kvm_arm_get_fw_reg().

Do you mean we add only one co-processor to list all SMC/HVC calls kernel will handle?

Yes, some way of listing them.
e.g. user-space wants to handle HVC's with #imm==0 and w0==0x84000000, this co-processor
would list that as one of the things that the kernel will handle.

If we can find a way of describing 64bit register values that would save them from being a
problem in the future, but it may be too complicated to describe a 64bit register space
and 16 bits of immediate.

I think its okay for this co-processor to be SMC-CC specific, as its describing what the
kernel supports. The KVM-api in contrast should be flexible enough to describe anything
any guest may wish to do.

So
the reg size should be large enough to hold the list, each entry of which contains a #imm
and r0/x0 pair? Is the reg size fixed by definition or it can be queried by user-space? If
it is fixed, what's the size should we choose?

(fixed/not-fixed - its a trade-off for complexity now, but no-one may ever use the full
flexibility).

I think we can assume the kernel will only offer things that look like SMC-CC to the
guest. If the guest does something outside this space, its up to user-space to handle. (so
the KVM-API must support non-SMC-CC stuff). I think we should define a co-processor for
SMC/HVC where the #imm is 0. This then gives us 32bits of space we can map directly onto
the w0 values.

Shall we setup a new class of co-processor and use the following id bit patterns (assuming the type index to be 0x0016)?

0x6030 <high 16 bits of SMC-CC function ID> 0016 <low 16 bits of SMC-CC function ID>

And the value of the co-processor returned to user space can be 0 (KVM will not handle) or 1 (KVM will handle)?

Does it make sense to extend the entry to hold the function ID base and limit, so that it
can describe the whole range for each function group, like PSCI, SDEI, etc?

This may be over-complex, user-space would always need to enumerate the whole thing. I
think commonly user-space would only want to know about one entry: For cases where we know
the structure, user-space can just query the '_VERSION' call. If that isn't supported,
user-space can assume the rest of that space is unimplemented. (the kernel shouldn't
provide an incomplete emulation of these APIs)

For 32bit, are we going to export SMC/HVC calls that failed their condition-code checks?

I'm not familiar with 32bit, either we don't have 32bit platforms to test the code. So my
preference is not to make many changes to 32bit...

I'm not that familiar with it either ... You don't have anything with aarch32 support at
EL1? I don't think we should add an API that only works with Aarch64 guests.

We have some D05 which is based on cortex A72 and should support aarch32 guest. I can take a try.

Our object is to support aarch32 guest on an aarch64 hypervisor, but not on an aarch32 hypervisor, isn't it?

For 32bit, we either need to expose these condition-code bits, and say user-space should
work out if it needs to do anything. Or, handle this in the kernel, in which case we don't
need to expose the condition-code bits, but we should document that the kernel will do the
check.

Nested-virt may cause some 'fun' here. If user-space starts an aarch64 guest at EL2, it
may start its own aarch32 guest at EL1. If the aarch32 guest makes an SMC, who handles it?
If user-space's aarch64 guest didn't set the traps for SMC, I think this should be
delivered to user-space, which may be surprised by the request from an aarch32 guest.

(its also possible nested-virt has me confused, it is pretty mind bending!)

The hypercall structure should probably indicate whether the SMC/HVC call came from
aarch32 or aarch64, as the behaviour may be different.

How about to use the longmode field in hypercall structure? Standard service calls will
indicate this in function ID, but we may need to know before parsing the function ID,
isn't it?

Sure, as its a __u32, we could dump the guest PSTATE from SPSR in there.

I think the last thing is 'ret', and whether we should provide a way of passing 'x0' back
to the guest, or expect user-space to use set-one-reg. Most of the time user-space will
only want to set x0, and doing this would let us initialise it to all-ones in the kernel,
which means the guest gets the unknown-smc value back if user-space ignores the exit.

The current RFC is not expecting user-space to use set-one-reg to set GP registers for returning, to reduce ioctl() invocations for better performance.
I didn't use "ret" for guest to hold the returned x0, but still used "args[6]" to exchange x0~x5. I agree to set quick path for x0 only, and kvm doesn't bother to set the other 5 GP registers.

Thanks a lot,

Heyi

Thanks,

James

.

Next message: Krzysztof Kozlowski: "Re: [PATCH v2 11/11] ARM: dts: imx6ul-kontron-n6310-s-43: Add missing includes for GPIOs and IRQs"
Previous message: Konstantin Khlebnikov: "[PATCH] x86/MCE/AMD: fix warning about sleep-in-atomic at early boot"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]