Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

From: Gregory Haskins
Date: Wed Sep 16 2009 - 10:11:11 EST

Next message: Masami Hiramatsu: "Re: [PATCH tracing/kprobes 2/6] ftrace: Fix trace_add_event_call()to initialize list"
Previous message: Jan Kara: "Re: [PATCH 2/2] Ext3: data=guarded mode"
In reply to: Avi Kivity: "Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server"
Next in thread: Avi Kivity: "Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Avi Kivity wrote:
> On 09/16/2009 02:44 PM, Gregory Haskins wrote:
>> The problem isn't where to find the models...the problem is how to
>> aggregate multiple models to the guest.
>>
>
> You mean configuration?
>
>>> You instantiate multiple vhost-nets. Multiple ethernet NICs is a
>>> supported configuration for kvm.
>>>
>> But this is not KVM.
>>
>>
>
> If kvm can do it, others can.

The problem is that you seem to either hand-wave over details like this,
or you give details that are pretty much exactly what vbus does already.
My point is that I've already sat down and thought about these issues
and solved them in a freely available GPL'ed software package.

So the question is: is your position that vbus is all wrong and you wish
to create a new bus-like thing to solve the problem? If so, how is it
different from what Ive already done? More importantly, what specific
objections do you have to what Ive done, as perhaps they can be fixed
instead of starting over?

>
>>>> His slave boards surface themselves as PCI devices to the x86
>>>> host. So how do you use that to make multiple vhost-based devices (say
>>>> two virtio-nets, and a virtio-console) communicate across the
>>>> transport?
>>>>
>>>>
>>> I don't really see the difference between 1 and N here.
>>>
>> A KVM surfaces N virtio-devices as N pci-devices to the guest. What do
>> we do in Ira's case where the entire guest represents itself as a PCI
>> device to the host, and nothing the other way around?
>>
>
> There is no guest and host in this scenario. There's a device side
> (ppc) and a driver side (x86). The driver side can access configuration
> information on the device side. How to multiplex multiple devices is an
> interesting exercise for whoever writes the virtio binding for that setup.

Bingo. So now its a question of do you want to write this layer from
scratch, or re-use my framework.

>
>>>> There are multiple ways to do this, but what I am saying is that
>>>> whatever is conceived will start to look eerily like a vbus-connector,
>>>> since this is one of its primary purposes ;)
>>>>
>>>>
>>> I'm not sure if you're talking about the configuration interface or data
>>> path here.
>>>
>> I am talking about how we would tunnel the config space for N devices
>> across his transport.
>>
>
> Sounds trivial.

No one said it was rocket science. But it does need to be designed and
implemented end-to-end, much of which Ive already done in what I hope is
an extensible way.

> Write an address containing the device number and
> register number to on location, read or write data from another.

You mean like the "u64 devh", and "u32 func" fields I have here for the
vbus-kvm connector?

http://git.kernel.org/?p=linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git;a=blob;f=include/linux/vbus_pci.h;h=fe337590e644017392e4c9d9236150adb2333729;hb=ded8ce2005a85c174ba93ee26f8d67049ef11025#l64

> Just
> like the PCI cf8/cfc interface.
>
>>> They aren't in the "guest". The best way to look at it is
>>>
>>> - a device side, with a dma engine: vhost-net
>>> - a driver side, only accessing its own memory: virtio-net
>>>
>>> Given that Ira's config has the dma engine in the ppc boards, that's
>>> where vhost-net would live (the ppc boards acting as NICs to the x86
>>> board, essentially).
>>>
>> That sounds convenient given his hardware, but it has its own set of
>> problems. For one, the configuration/inventory of these boards is now
>> driven by the wrong side and has to be addressed.
>
> Why is it the wrong side?

"Wrong" is probably too harsh a word when looking at ethernet. Its
certainly "odd", and possibly inconvenient. It would be like having
vhost in a KVM guest, and virtio-net running on the host. You could do
it, but its weird and awkward. Where it really falls apart and enters
the "wrong" category is for non-symmetric devices, like disk-io.

>
>> Second, the role
>> reversal will likely not work for many models other than ethernet (e.g.
>> virtio-console or virtio-blk drivers running on the x86 board would be
>> naturally consuming services from the slave boards...virtio-net is an
>> exception because 802.x is generally symmetrical).
>>
>
> There is no role reversal.

So if I have virtio-blk driver running on the x86 and vhost-blk device
running on the ppc board, I can use the ppc board as a block-device.
What if I really wanted to go the other way?

> The side doing dma is the device, the side
> accessing its own memory is the driver. Just like that other 1e12
> driver/device pairs out there.

IIUC, his ppc boards really can be seen as "guests" (they are linux
instances that are utilizing services from the x86, not the other way
around). vhost forces the model to have the ppc boards act as IO-hosts,
whereas vbus would likely work in either direction due to its more
refined abstraction layer.

>
>>> I have no idea, that's for Ira to solve.
>>>
>> Bingo. Thus my statement that the vhost proposal is incomplete. You
>> have the virtio-net and vhost-net pieces covering the fast-path
>> end-points, but nothing in the middle (transport, aggregation,
>> config-space), and nothing on the management-side. vbus provides most
>> of the other pieces, and can even support the same virtio-net protocol
>> on top. The remaining part would be something like a udev script to
>> populate the vbus with devices on board-insert events.
>>
>
> Of course vhost is incomplete, in the same sense that Linux is
> incomplete. Both require userspace.

A vhost based solution to Iras design is missing more than userspace.
Many of those gaps are addressed by a vbus based solution.

>
>>> If he could fake the PCI
>>> config space as seen by the x86 board, he would just show the normal pci
>>> config and use virtio-pci (multiple channels would show up as a
>>> multifunction device). Given he can't, he needs to tunnel the virtio
>>> config space some other way.
>>>
>> Right, and note that vbus was designed to solve this. This tunneling
>> can, of course, be done without vbus using some other design. However,
>> whatever solution is created will look incredibly close to what I've
>> already done, so my point is "why reinvent it"?
>>
>
> virtio requires binding for this tunnelling, so does vbus.

We aren't talking about virtio. Virtio would work with either vbus or
vhost. This is purely a question of what the layers below virtio and
the device backend looks like.

> Its the same problem with the same solution.

I disagree.

Kind Regards,
-Greg

Attachment: signature.asc
Description: OpenPGP digital signature

Next message: Masami Hiramatsu: "Re: [PATCH tracing/kprobes 2/6] ftrace: Fix trace_add_event_call()to initialize list"
Previous message: Jan Kara: "Re: [PATCH 2/2] Ext3: data=guarded mode"
In reply to: Avi Kivity: "Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server"
Next in thread: Avi Kivity: "Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]