Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

From: Avi Kivity
Date: Thu Sep 17 2009 - 03:51:41 EST

Next message: Sebastian Andrzej Siewior: "Re: [PATCH] MAINTAINERS: Move ARM lists to infradead"
Previous message: tip-bot for Ingo Molnar: "[tip:sched/core] sched: Fix TASK_WAKING & loadaverage breakage"
In reply to: Gregory Haskins: "Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server"
Next in thread: Javier Guerra: "Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 09/17/2009 06:11 AM, Gregory Haskins wrote:

irqfd/eventfd is the abstraction layer, it doesn't need to be reabstracted.

Not per se, but it needs to be interfaced. How do I register that
eventfd with the fastpath in Ira's rig? How do I signal the eventfd
(x86->ppc, and ppc->x86)?

You write a userspace or kernel module to do it. It's a few dozen lines of code.

To take it to the next level, how do I organize that mechanism so that
it works for more than one IO-stream (e.g. address the various queues
within ethernet or a different device like the console)? KVM has
IOEVENTFD and IRQFD managed with MSI and PIO. This new rig does not
have the luxury of an established IO paradigm.

Is vbus the only way to implement a solution? No. But it is _a_ way,
and its one that was specifically designed to solve this very problem
(as well as others).

virtio assumes that the number of transports will be limited and interesting growth is in the number of device classes and drivers. So we have support for just three transports, but 6 device classes (9p, rng, balloon, console, blk, net) and 8 drivers (the preceding 6 for linux, plus blk/net for Windows). It would have nice to be able to write a new binding in Visual Basic but it's hardly a killer feature.

Since vbus was designed to do exactly that, this is
what I would advocate. You could also reinvent these concepts and put
your own mux and mapping code in place, in addition to all the other
stuff that vbus does. But I am not clear why anyone would want to.

Maybe they like their backward compatibility and Windows support.

This is really not relevant to this thread, since we are talking about
Ira's hardware. But if you must bring this up, then I will reiterate
that you just design the connector to interface with QEMU+PCI and you
have that too if that was important to you.

Well, for Ira the major issue is probably inclusion in the upstream kernel.

But on that topic: Since you could consider KVM a "motherboard
manufacturer" of sorts (it just happens to be virtual hardware), I don't
know why KVM seems to consider itself the only motherboard manufacturer
in the world that has to make everything look legacy. If a company like
ASUS wants to add some cutting edge IO controller/bus, they simply do
it.

No, they don't. New buses are added through industry consortiums these days. No one adds a bus that is only available with their machine, not even Apple.

Pretty much every product release may contain a different array of
devices, many of which are not backwards compatible with any prior
silicon. The guy/gal installing Windows on that system may see a "?" in
device-manager until they load a driver that supports the new chip, and
subsequently it works. It is certainly not a requirement to make said
chip somehow work with existing drivers/facilities on bare metal, per
se. Why should virtual systems be different?

Devices/drivers are a different matter, and if you have a virtio-net device you'll get the same "?" until you load the driver. That's how people and the OS vendors expect things to work.

What I was getting at is that you can't just hand-wave the datapath
stuff. We do fast path in KVM with IRQFD/IOEVENTFD+PIO, and we do
device discovery/addressing with PCI.

That's not datapath stuff.

Neither of those are available
here in Ira's case yet the general concepts are needed. Therefore, we
have to come up with something else.

Ira has to implement virtio's ->kick() function and come up with something for discovery. It's a lot less lines of code than there are messages in this thread.

Yes. I'm all for reusing virtio, but I'm not going switch to vbus or
support both for this esoteric use case.

With all due respect, no one asked you to. This sub-thread was
originally about using vhost in Ira's rig. When problems surfaced in
that proposed model, I highlighted that I had already addressed that
problem in vbus, and here we are.

Ah, okay. I have no interest in Ira choosing either virtio or vbus.

vhost-net somehow manages to work without the config stuff in the kernel.

I was referring to data-path stuff, like signal and memory
configuration/routing.

signal and memory configuration/routing are not data-path stuff.

Well, virtio has a similar abstraction on the guest side. The host side
abstraction is limited to signalling since all configuration is in
userspace. vhost-net ought to work for lguest and s390 without change.

But IIUC that is primarily because the revectoring work is already in
QEMU for virtio-u and it rides on that, right? Not knocking that, thats
nice and a distinct advantage. It should just be noted that its based
on sunk-cost, and not truly free. Its just already paid for, which is
different. It also means it only works in environments based on QEMU,
which not all are (as evident by this sub-thread).

No. We expose a mix of emulated-in-userspace and emulated-in-the-kernel devices on one bus. Devices emulated in userspace only lose by having the bus emulated in the kernel. Devices in the kernel gain nothing from having the bus emulated in the kernel. It's a complete slow path so it belongs in userspace where state is easy to get at, development is faster, and bugs are cheaper to fix.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Sebastian Andrzej Siewior: "Re: [PATCH] MAINTAINERS: Move ARM lists to infradead"
Previous message: tip-bot for Ingo Molnar: "[tip:sched/core] sched: Fix TASK_WAKING & loadaverage breakage"
In reply to: Gregory Haskins: "Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server"
Next in thread: Javier Guerra: "Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]