To clarify this point:
It also pulls parts of the device model into the host kernel.That is the point. Most of it needs to be there for performance.
There are various aspects about designing high-performance virtual
devices such as providing the shortest paths possible between the
physical resources and the consumers. Conversely, we also need to
ensure that we meet proper isolation/protection guarantees at the same
time. What this means is there are various aspects to any
high-performance PV design that require to be placed in-kernel to
maximize the performance yet properly isolate the guest.
For instance, you are required to have your signal-path (interrupts and
hypercalls), your memory-path (gpa translation), and
addressing/isolation model in-kernel to maximize performance.
Vbus accomplishes its in-kernel isolation model by providing a
"container" concept, where objects are placed into this container by
userspace. The host kernel enforces isolation/protection by using a
namespace to identify objects that is only relevant within a specific
container's context (namely, a "u32 dev-id"). The guest addresses the
objects by its dev-id, and the kernel ensures that the guest can't
access objects outside of its dev-id namespace.
All that is required is a way to transport a message with a "devid"
attribute as an address (such as DEVCALL(devid)) and the framework
provides the rest of the decode+execute function.
Contrast this to vhost+virtio-pci (called simply "vhost" from here).
It is not immune to requiring in-kernel addressing support either, but
rather it just does it differently (and its not as you might expect via
qemu).
Vhost relies on QEMU to render PCI objects to the guest, which the guest
assigns resources (such as BARs, interrupts, etc).
A PCI-BAR in this
example may represent a PIO address for triggering some operation in the
device-model's fast-path. For it to have meaning in the fast-path, KVM
has to have in-kernel knowledge of what a PIO-exit is, and what to do
with it (this is where pio-bus and ioeventfd come in). The programming
of the PIO-exit and the ioeventfd are likewise controlled by some
userspace management entity (i.e. qemu). The PIO address and value
tuple form the address, and the ioeventfd framework within KVM provide
the decode+execute function.
This idea seemingly works fine, mind you, but it rides on top of a *lot*
of stuff including but not limited to: the guests pci stack, the qemu
pci emulation, kvm pio support, and ioeventfd. When you get into
situations where you don't have PCI or even KVM underneath you (e.g. a
userspace container, Ira's rig, etc) trying to recreate all of that PCI
infrastructure for the sake of using PCI is, IMO, a lot of overhead for
little gain.
All you really need is a simple decode+execute mechanism, and a way to
program it from userspace control. vbus tries to do just that:
commoditize it so all you need is the transport of the control messages
(like DEVCALL()), but the decode+execute itself is reuseable, even
across various environments (like KVM or Iras rig).
And your argument, I believe, is that vbus allows both to be implemented
in the kernel (though to reiterate, its optional) and is therefore a bad
design, so lets discuss that.
I believe the assertion is that things like config-space are best left
to userspace, and we should only relegate fast-path duties to the
kernel. The problem is that, in my experience, a good deal of
config-space actually influences the fast-path and thus needs to
interact with the fast-path mechanism eventually anyway.
Whats left
over that doesn't fall into this category may cheaply ride on existing
plumbing, so its not like we created something new or unnatural just to
support this subclass of config-space.
For example: take an attribute like the mac-address assigned to a NIC.
This clearly doesn't need to be in-kernel and could go either way (such
as a PCI config-space register).
As another example: consider an option bit that enables a new feature
that affects the fast-path, like RXBUF merging. If we use the split
model where config space is handled by userspace and fast-path is
in-kernel, the userspace component is only going to act as a proxy.
I.e. it will pass the option down to the kernel eventually. Therefore,
there is little gain in trying to split this type of slow-path out to
userspace. In fact, its more work.