Re: [PATCH RFC] pass write value to in_range pointers

From: Gregory Haskins
Date: Tue Jun 23 2009 - 00:04:40 EST


Michael S. Tsirkin wrote:
> On Mon, Jun 22, 2009 at 12:29:10PM -0400, Gregory Haskins wrote:
>
>> Michael S. Tsirkin wrote:
>>
>>> On Mon, Jun 22, 2009 at 11:45:00AM -0400, Gregory Haskins wrote:
>>>
>>>
>>>> Michael S. Tsirkin wrote:
>>>>
>>>>
>>>>> It seems that a lot of complexity and trickiness with iosignalfd is
>>>>> handling the group/item relationship, which comes about because kvm does
>>>>> not currently let a device on the bus claim a write transaction based on the
>>>>> value written. This could be greatly simplified if the value written
>>>>> was passed to the in_range check for write operation. We could then
>>>>> simply make each kvm_iosignalfd a device on the bus.
>>>>>
>>>>> What does everyone think of the following lightly tested patch?
>>>>>
>>>>>
>>>>>
>>>> Hi Michael,
>>>> Its interesting, but I am not convinced its necessary. We created the
>>>> group/item layout because iosignalfds are unique in that they are
>>>> probably the only IO device that wants to do some kind of address
>>>> aliasing.
>>>>
>>>>
>>> We actually already have aliasing: is_write flag is used for this
>>> purpose.
>>>
>> Yes, but read/write address aliasing is not the same thing is
>> multi-match data aliasing.
>>
>
> What's the big difference?
>

Well, for one its not very clear what the benefit of the read/write
aliasing even is. ;) Apparently coalesced_mmio uses it, but even so I
doubt that is for the purposes of having one device do reads while
another does writes. I could be wrong, though.
>
>> Besides, your proposal also breaks
>>
>
> s/break/removes limitation/ :)
>
>
>> some of
>> the natural relationship models
>> (e.g. all the aliased iosignal_items
>> always belong to the same underlying device. io_bus entries have an
>> arbitrary topology).
>>
>
> iosignal_item is an artifact, they are not seen by user -
> they are just a work around an API limitation.
>

Well, not really. The "limitation" is a natural attribute of real
hardware too. I don't know of a whole bunch of real hardware that
aliases multiple address decoders to the same value, do you? How would
you handle reads?

The way I think of these items are not as unique devices in the io_bus
sense. We have one address decoder (the group) listening on the bus to
a unique address. That group provides a data-decoding service, where
you can program what event gets triggered for which value (or wildcard =
all values).

It is unnatural to turn it around and say that the io_bus now scans
exhaustively on the address/data tuple, and we support aliasing such
that multiple decoders may coexist. There isn't any other device
(hardware or software) that will actually do this aside from iosignalfd,
to my knowledge. So why distort the entire io_bus subsystem's
addressing scheme to match? I don't get the motivation, as it appears
to me to be already handled ideally as it is.

Perhaps I am contributing to this confusion with my decision to make the
group creation implicit with the first item creation. Would you feel
better about this if we made the distinction with group and item more
explicit? (E.g. IOSIGNALFD_GROUP_ASSIGN and IOSIGNALFD_ITEM_ASSIGN
verbs). I am fine with that change, I suppose.


> And they are only grouped if the same PIO offset is used for all accesses.
> Why is not always the case. If a device uses several PIO offsets
> (as virtio does), you create separate devices for a single guest device too.
>

Thats ok. As I stated above, I view this as a data-match service
against a single address decoder, not as multiple decoders aliased to
the same address/data pair. Its perfectly fine for a logical device to
employ multiple decoders, but its unnatural for there to be multiple
decoders to the same address. Its also unnatural to use the data as
part of the decode criteria for a general IO cycle.

>
>>> Actually, it's possible to remove is_write by passing
>>> a null pointer in write_val for reads. I like this a bit less as
>>> the code generated is less compact ... Avi, what do you think?
>>>
>>>
>>>
>>>> With what you are proposing here, you are adding aliasing
>>>> support to the general infrastructure which I am not (yet) convinced is
>>>> necessary.
>>>>
>>>>
>>> Infrastructure is a big name for something that adds a total of 10 lines to kvm.
>>> And it should at least halve the size of your 450-line patch.
>>>
>>>
>> Your patch isn't complete until some critical missing features are added
>> to io_bus, though, so its not really just 10 lines.
>> For one, it will
>> need to support much more than 6 devices.
>>
>
> Isn't this like a #define change? With the item patch we are still
> limited in the number of groups we can create.
>

Well, no because we want to support ~512 of these things, so the array
is not going to scale. Right now iosignalfd is implemented as a
tree+list, which is a little better for scalability but even that needs
to change soon. We probably ultimately want to do either a rbtree or
radixtree for all of these things. But that is an optimization patch
for a later day ;)

> What we gain is a simple array/list instead of a tree of
> linked lists that makes cheshire cheese out of CPU data cache.
>
>

Yeah, that is a down side, I admit. But the current io_bus array will
probably not scale going forward either so it needs to be addressed on
that side as well. We can always implement a SW cache, if need be, to
account for this.

>> It will also need to support
>> multiple matches.
>>
>
> What, signal many fds on the same address/value pair?
> I see this as a bug. Why is this a good thing to support?
> Just increases the chance of leaking this fd.
>

I believe Avi asked for this feature specifically, so I will defer to him.


>
>> Also you are proposing an general interface change
>> that doesn't make sense to all but one device type. So now every
>> io-device developer that comes along will scratch their head at what to
>> do with that field.
>>
>
> What do they do with is_write now? Ignore it. It's used in a whole
> of 1 place.
>
>
>> None of these are insurmountable hurdles, but my point is that today the
>> complexity is encapsulated in the proper place IMO.
>>
>
> It's better to get rid of complexity than encapsulate it.
>

All you are doing is moving the complexity to a place where I don't see
it serving a benefit to any code other than the one you just took it
away from. Like it or not, io_bus will have to address scale and
data-matching to do what you want. And the address/data match doesn't
make sense to anyone else afaict. In addition, data-matching at that
level is even harder because we can make little simplifications with the
way we do things now (like enforce that all add/lens conform to the
group addr/len). If you go to a general address/data tuple decoder, you
will presumably need to have a much more flexible decoder design
(overlapping regions, etc).

Again, not insurmountable...but I am not seeing the advantage to make
this change worthwhile either.


>
>> E.g. The one and
>> only device that cares to do this "weird" thing handles it behind an
>> interface that makes sense to all parties involved.
>>
>>>
>>>
>>>> If there isn't a use case for other devices to have
>>>> aliasing, I would think the logic is best contained in iosignalfd. Do
>>>> you have something in mind?
>>>>
>>>>
>>> One is enough :)
>>>
>>>
>> I am not convinced yet. ;) It appears to me that we are leaking
>> iosignalfd-isms into the general code. If there is another device that
>> wants to do something similar, ok. But I can't think of any.
>>
>
> You never know. is_write was used by a whole of 1 user: coalesced_mmio,
> then your patch comes along ...
>

Can it wait until that happens, then? :) I'm already at v8 and a v9 is
brewing. It's not like we can't change later if something better comes
along.

>
>
>>> Seriously, do you see that this saves you all of RCU, linked lists and
>>> counters?
>>>
>> Well, also keep in mind we will probably be converting io_bus to RCU
>> very soon, so we are going the opposite direction ;)
>>
>> Kind Regards,
>> -Greg
>>
>>
>
> Same direction. Let's put RCU in iobus, we don't need another one on
> top of it. That's encapsulating complexity.
>

Is this really all that complicated? The iosignalfd code is fairly
trivial even with the group/item nesting IMO. I also think it makes
sense to live where it is.

Thanks Michael,
-Greg


Attachment: signature.asc
Description: OpenPGP digital signature