Re: [PATCH v5 18/20] RISC-V: KVM: Add SBI v0.1 support

From: Graf (AWS), Alexander
Date: Fri Aug 23 2019 - 07:39:03 EST




> Am 23.08.2019 um 13:18 schrieb Anup Patel <anup@xxxxxxxxxxxxxx>:
>
>> On Fri, Aug 23, 2019 at 1:34 PM Alexander Graf <graf@xxxxxxxxxx> wrote:
>>
>>> On 22.08.19 10:46, Anup Patel wrote:
>>> From: Atish Patra <atish.patra@xxxxxxx>
>>>
>>> The KVM host kernel running in HS-mode needs to handle SBI calls coming
>>> from guest kernel running in VS-mode.
>>>
>>> This patch adds SBI v0.1 support in KVM RISC-V. All the SBI calls are
>>> implemented correctly except remote tlb flushes. For remote TLB flushes,
>>> we are doing full TLB flush and this will be optimized in future.
>>>
>>> Signed-off-by: Atish Patra <atish.patra@xxxxxxx>
>>> Signed-off-by: Anup Patel <anup.patel@xxxxxxx>
>>> Acked-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
>>> Reviewed-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
>>> ---
>>> arch/riscv/include/asm/kvm_host.h | 2 +
>>> arch/riscv/kvm/Makefile | 2 +-
>>> arch/riscv/kvm/vcpu_exit.c | 3 +
>>> arch/riscv/kvm/vcpu_sbi.c | 119 ++++++++++++++++++++++++++++++
>>> 4 files changed, 125 insertions(+), 1 deletion(-)
>>> create mode 100644 arch/riscv/kvm/vcpu_sbi.c
>>>
>>> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
>>> index 2af3a179c08e..0b1eceaef59f 100644
>>> --- a/arch/riscv/include/asm/kvm_host.h
>>> +++ b/arch/riscv/include/asm/kvm_host.h
>>> @@ -241,4 +241,6 @@ bool kvm_riscv_vcpu_has_interrupt(struct kvm_vcpu *vcpu);
>>> void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu);
>>> void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu);
>>>
>>> +int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu);
>>> +
>>> #endif /* __RISCV_KVM_HOST_H__ */
>>> diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
>>> index 3e0c7558320d..b56dc1650d2c 100644
>>> --- a/arch/riscv/kvm/Makefile
>>> +++ b/arch/riscv/kvm/Makefile
>>> @@ -9,6 +9,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
>>> kvm-objs := $(common-objs-y)
>>>
>>> kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
>>> -kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o
>>> +kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o vcpu_sbi.o
>>>
>>> obj-$(CONFIG_KVM) += kvm.o
>>> diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
>>> index fbc04fe335ad..87b83fcf9a14 100644
>>> --- a/arch/riscv/kvm/vcpu_exit.c
>>> +++ b/arch/riscv/kvm/vcpu_exit.c
>>> @@ -534,6 +534,9 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>>> (vcpu->arch.guest_context.hstatus & HSTATUS_STL))
>>> ret = stage2_page_fault(vcpu, run, scause, stval);
>>> break;
>>> + case EXC_SUPERVISOR_SYSCALL:
>>> + if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV)
>>> + ret = kvm_riscv_vcpu_sbi_ecall(vcpu);
>>> default:
>>> break;
>>> };
>>> diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
>>> new file mode 100644
>>> index 000000000000..5793202eb514
>>> --- /dev/null
>>> +++ b/arch/riscv/kvm/vcpu_sbi.c
>>> @@ -0,0 +1,119 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/**
>>> + * Copyright (c) 2019 Western Digital Corporation or its affiliates.
>>> + *
>>> + * Authors:
>>> + * Atish Patra <atish.patra@xxxxxxx>
>>> + */
>>> +
>>> +#include <linux/errno.h>
>>> +#include <linux/err.h>
>>> +#include <linux/kvm_host.h>
>>> +#include <asm/csr.h>
>>> +#include <asm/kvm_vcpu_timer.h>
>>> +
>>> +#define SBI_VERSION_MAJOR 0
>>> +#define SBI_VERSION_MINOR 1
>>> +
>>> +/* TODO: Handle traps due to unpriv load and redirect it back to VS-mode */
>>
>> Ugh, another one of those? Can't you just figure out a way to recover
>> from the page fault? Also, you want to combine this with the instruction
>> load logic, so that we have a single place that guest address space
>> reads go through.
>
> Walking Guest page table would be more expensive compared to implementing
> a trap handling mechanism.
>
> We will be adding trap handling mechanism for reading instruction and reading
> load.
>
> Both these operations are different in following ways:
> 1. RISC-V instructions are variable length. We get to know exact instruction
> length only after reading first 16bits
> 2. We need to set VSSTATUS.MXR bit when reading instruction for
> execute-only Guest pages.

Yup, sounds like you could solve that with a trivial if() based on "read instruction" or not, no? If you want to, feel free to provide short versions that do only read ins/data, but I would really like to see the whole "data reads become guest reads" magic to be funneled through a single function (in C, can be inline unrolled in asm of course)

>
>>
>>> +static unsigned long kvm_sbi_unpriv_load(const unsigned long *addr,
>>> + struct kvm_vcpu *vcpu)
>>> +{
>>> + unsigned long flags, val;
>>> + unsigned long __hstatus, __sstatus;
>>> +
>>> + local_irq_save(flags);
>>> + __hstatus = csr_read(CSR_HSTATUS);
>>> + __sstatus = csr_read(CSR_SSTATUS);
>>> + csr_write(CSR_HSTATUS, vcpu->arch.guest_context.hstatus | HSTATUS_SPRV);
>>> + csr_write(CSR_SSTATUS, vcpu->arch.guest_context.sstatus);
>>> + val = *addr;
>>> + csr_write(CSR_HSTATUS, __hstatus);
>>> + csr_write(CSR_SSTATUS, __sstatus);
>>> + local_irq_restore(flags);
>>> +
>>> + return val;
>>> +}
>>> +
>>> +static void kvm_sbi_system_shutdown(struct kvm_vcpu *vcpu, u32 type)
>>> +{
>>> + int i;
>>> + struct kvm_vcpu *tmp;
>>> +
>>> + kvm_for_each_vcpu(i, tmp, vcpu->kvm)
>>> + tmp->arch.power_off = true;
>>> + kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
>>> +
>>> + memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
>>> + vcpu->run->system_event.type = type;
>>> + vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
>>> +}
>>> +
>>> +int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu)
>>> +{
>>> + int ret = 1;
>>> + u64 next_cycle;
>>> + int vcpuid;
>>> + struct kvm_vcpu *remote_vcpu;
>>> + ulong dhart_mask;
>>> + struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
>>> +
>>> + if (!cp)
>>> + return -EINVAL;
>>> + switch (cp->a7) {
>>> + case SBI_SET_TIMER:
>>> +#if __riscv_xlen == 32
>>> + next_cycle = ((u64)cp->a1 << 32) | (u64)cp->a0;
>>> +#else
>>> + next_cycle = (u64)cp->a0;
>>> +#endif
>>> + kvm_riscv_vcpu_timer_next_event(vcpu, next_cycle);
>>
>> Ah, this is where the timer set happens. I still don't understand how
>> this takes the frequency bit into account?
>
> Explained it in PATCH17 comments.
>
>>
>>> + break;
>>> + case SBI_CONSOLE_PUTCHAR:
>>> + /* Not implemented */
>>> + cp->a0 = -ENOTSUPP;
>>> + break;
>>> + case SBI_CONSOLE_GETCHAR:
>>> + /* Not implemented */
>>> + cp->a0 = -ENOTSUPP;
>>> + break;
>>
>> These two should be covered by the default case.
>
> Sure, I will update.
>
>>
>>> + case SBI_CLEAR_IPI:
>>> + kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_S_SOFT);
>>> + break;
>>> + case SBI_SEND_IPI:
>>> + dhart_mask = kvm_sbi_unpriv_load((unsigned long *)cp->a0, vcpu);
>>> + for_each_set_bit(vcpuid, &dhart_mask, BITS_PER_LONG) {
>>> + remote_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, vcpuid);
>>> + kvm_riscv_vcpu_set_interrupt(remote_vcpu, IRQ_S_SOFT);
>>> + }
>>> + break;
>>> + case SBI_SHUTDOWN:
>>> + kvm_sbi_system_shutdown(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN);
>>> + ret = 0;
>>> + break;
>>> + case SBI_REMOTE_FENCE_I:
>>> + sbi_remote_fence_i(NULL);
>>> + break;
>>> + /*
>>> + * TODO: There should be a way to call remote hfence.bvma.
>>> + * Preferred method is now a SBI call. Until then, just flush
>>> + * all tlbs.
>>> + */
>>> + case SBI_REMOTE_SFENCE_VMA:
>>> + /*TODO: Parse vma range.*/
>>> + sbi_remote_sfence_vma(NULL, 0, 0);
>>> + break;
>>> + case SBI_REMOTE_SFENCE_VMA_ASID:
>>> + /*TODO: Parse vma range for given ASID */
>>> + sbi_remote_sfence_vma(NULL, 0, 0);
>>> + break;
>>> + default:
>>> + cp->a0 = ENOTSUPP;
>>> + break;
>>
>> Please just send unsupported SBI events into user space.
>
> For unsupported SBI calls, we should be returning error to the
> Guest Linux so that do something about it. This is in accordance
> with the SBI spec.

That's up to user space (QEMU / kvmtool) to decide. If user space wants to implement the console functions (like we do on s390), it should have the chance to do so.

Alex