Re: [PATCH 3/5] vhost: support upto 509 memory regions

From: Paolo Bonzini
Date: Thu Jun 18 2015 - 09:46:33 EST




On 18/06/2015 15:19, Michael S. Tsirkin wrote:
> On Thu, Jun 18, 2015 at 01:50:32PM +0200, Paolo Bonzini wrote:
>>
>>
>> On 18/06/2015 13:41, Michael S. Tsirkin wrote:
>>> On Thu, Jun 18, 2015 at 01:39:12PM +0200, Igor Mammedov wrote:
>>>> Lets leave decision upto users instead of making them live with
>>>> crashing guests.
>>>
>>> Come on, let's fix it in userspace.
>>
>> It's not trivial to fix it in userspace. Since QEMU uses RCU there
>> isn't a single memory map to use for a linear gpa->hva map.
>
> Could you elaborate?
>
> I'm confused by this mention of RCU.
> You use RCU for accesses to the memory map, correct?
> So memory map itself is a write side operation, as such all you need to
> do is take some kind of lock to prevent conflicting with other memory
> maps, do rcu sync under this lock.

You're right, the problem isn't directly related to RCU. RCU would be
easy to handle by using synchronize_rcu instead of call_rcu. While I
identified an RCU-related problem with Igor's patches, it's much more
entrenched.

RAM can be used by asynchronous operations while the VM runs, between
address_space_map and address_space_unmap. It is possible and common to
have a quiescent state between the map and unmap, and a memory map
change can happen in the middle of this. Normally this is not a
problem, because changes to the memory map do not make the hva go away
(memory regions are reference counted).

However, with Igor's patches a memory_region_del_subregion will cause a
mmap(MAP_NORESERVE), which _does_ have the effect of making the hva go away.

I guess one way to do it would be to alias the same page in two places,
one for use by vhost and one for use by everything else. However, the
kernel does not provide the means to do this kind of aliasing for
anonymous mmaps.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/