Re: [PATCH] kvm: remove in_range from kvm_io_device

From: Gregory Haskins
Date: Tue Jun 23 2009 - 11:45:33 EST


Michael S. Tsirkin wrote:
> On Tue, Jun 23, 2009 at 11:21:53AM -0400, Gregory Haskins wrote:
>
>> Michael S. Tsirkin wrote:
>>
>>> Remove in_range from kvm_io_device and ask read/write callbacks, if
>>> supplied, to perform range checks internally. This allows aliasing
>>> (mostly for in-kernel virtio), as well as better error handling by
>>> making it possible to pass errors up to userspace. And it's enough to
>>> look at the diffstat to see that it's a better API anyway.
>>>
>>> While we are at it, document locking rules for kvm_io_device.
>>>
>>>
>> Sorry, not trying to be a PITA, but I liked your last suggestion better. :(
>>
>> I am thinking forward to when we want to use something smarter than a
>> linear search (like rbtree/radix) for scaling the number of "devices"
>> (really, virtio-rings) that we support.
>>
>
> in_range is broken for this anyway: you need more than a boolean
> predicate to implement rbtree/radix
>

Yes, understood..in_range() needs to be (pardon the pun) "addressed"
;). But getting rid of in_range() and moving the match logic into the
read()/write() verbs is potentially a step in the wrong direction if we
ever wanted to go that route. And I'm pretty sure we do.

>
>> The current device-count
>> target is 512, which we will begin to rapidly consume as the in-kernel
>> virtio work progresses.
>>
>
> That's a large number. I had in mind more like 4 virtio devices, for
> starters: 1 for each virtqueue in net and block.
>

Thats way to low. For instance, I'll be wanting to do things like
802.1p which would be 16 virtio-rings per device (8 prio levels tx, 8
levels rx). And thats just for one device. I think Avi came up with an
estimate of supporting 20 devices @ 16 queues = 320, so we rounded it to
512.
>
>> This proposed approach forces us into a
>> potential O(256) algorithm in the hotpath (all MMIO/PIO exits will hit
>> this, not just in-kernel users). How would you address this?
>>
>
> Two ideas that come to mind:
> - add addr/len fields to devices, use these to speed up lookup
>

Yep, thats what I was thinking as well. We can have the top-level
(group) be an rbtree on addr/len, and then walk the list of items at
that address linearly using your read/write() approach.


> - add a small cache that can be scanned first
>

Yep, I think we may want to do this anyway independent of the search alg.

> In both cases, you first do a fast lookup, ask the device whether
> it wants the transaction, then resort to linear scan if not
>

-Greg



Attachment: signature.asc
Description: OpenPGP digital signature