Re: [RFC PATCH 0/8] Dynamic vcpu priority management in kvm

From: Sean Christopherson
Date: Thu Dec 14 2023 - 15:13:58 EST


On Thu, Dec 14, 2023, Vineeth Remanan Pillai wrote:
> On Thu, Dec 14, 2023 at 11:38 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> Now when I think about it, the implementation seems to
> suggest that we are putting policies in kvm. Ideally, the goal is:
> - guest scheduler communicates the priority requirements of the workload
> - kvm applies the priority to the vcpu task.

Why? Tasks are tasks, why does KVM need to get involved? E.g. if the problem
is that userspace doesn't have the right knobs to adjust the priority of a task
quickly and efficiently, then wouldn't it be better to solve that problem in a
generic way?

> - Now that vcpu is appropriately prioritized, host scheduler can make
> the right choice of picking the next best task.
>
> We have an exception of proactive boosting for interrupts/nmis. I
> don't expect these proactive boosting cases to grow. And I think this
> also to be controlled by the guest where the guest can say what
> scenarios would it like to be proactive boosted.
>
> That would make kvm just a medium to communicate the scheduler
> requirements from guest to host and not house any policies. What do
> you think?

...

> > Pushing the scheduling policies to host userspace would allow for far more control
> > and flexibility. E.g. a heavily paravirtualized environment where host userspace
> > knows *exactly* what workloads are being run could have wildly different policies
> > than an environment where the guest is a fairly vanilla Linux VM that has received
> > a small amount of enlightment.
> >
> > Lastly, if the concern/argument is that userspace doesn't have the right knobs
> > to (quickly) boost vCPU tasks, then the proposed sched_ext functionality seems
> > tailor made for the problems you are trying to solve.
> >
> > https://lkml.kernel.org/r/20231111024835.2164816-1-tj%40kernel.org
> >
> You are right, sched_ext is a good choice to have policies
> implemented. In our case, we would need a communication mechanism as
> well and hence we thought kvm would work best to be a medium between
> the guest and the host.

Making KVM be the medium may be convenient and the quickest way to get a PoC
out the door, but effectively making KVM a middle-man is going to be a huge net
negative in the long term. Userspace can communicate with the guest just as
easily as KVM, and if you make KVM the middle-man, then you effectively *must*
define a relatively rigid guest/host ABI.

If instead the contract is between host userspace and the guest, the ABI can be
much more fluid, e.g. if you (or any setup) can control at least some amount of
code that runs in the guest, then the contract between the guest and host doesn't
even need to be formally defined, it could simply be a matter of bundling host
and guest code appropriately.

If you want to land support for a given contract in upstream repositories, e.g.
to broadly enable paravirt scheduling support across a variety of usersepace VMMs
and/or guests, then yeah, you'll need a formal ABI. But that's still not a good
reason to have KVM define the ABI. Doing it in KVM might be a wee bit easier because
it's largely just a matter of writing code, and LKML provides a centralized channel
for getting buyin from all parties. But defining an ABI that's independent of the
kernel is absolutely doable, e.g. see the many virtio specs.

I'm not saying KVM can't help, e.g. if there is information that is known only
to KVM, but the vast majority of the contract doesn't need to be defined by KVM.