Re: [PATCH 1/6] xenbus: prepare data structures and parameter for xenwatch multithreading

From: Dongli Zhang
Date: Tue Sep 25 2018 - 01:13:46 EST


Hi Boris,

On 09/18/2018 03:08 AM, Boris Ostrovsky wrote:
> On 9/16/18 9:20 PM, Dongli Zhang wrote:
>> Hi Boris,
>>
>> On 09/17/2018 04:17 AM, Boris Ostrovsky wrote:
>>>
>>> On 9/14/18 3:34 AM, Dongli Zhang wrote:
>>>
>>>> +
>>>> +struct mtwatch_info {
>>>> + /*
>>>> + * The mtwatch_domain is put on both a hash table and a list.
>>>> + * domain_list is used to optimize xenbus_watch un-registration.
>>>> + *
>>>> + * The mtwatch_domain is removed from domain_hash (with state set
>>>> + * to MTWATCH_DOMAIN_DOWN) when its refcnt is zero. However, it is
>>>> + * left on domain_list until all events belong to such
>>>> + * mtwatch_domain are processed in mtwatch_thread().
>>>
>>> Do we really need to keep mwatch_domain on both lists? Why is keeping it on,
>>> say, only the hash not sufficient?
>> In the state of the art xenbus, when a watch is unregistered (e.g.,
>> unregister_xenbus_watch()), we need to traverse the list 'watch_events' to
>> remove all inflight/pending events (for such watch) from 'watch_events'.
>>
>> About this patch set, as each domain would have its own event list, we need to
>> traverse the list of each domain to remove the pending events for the watch to
>> be unregistered.
>>
>> E.g.,
>> unregister_xenbus_watch()-->unregister_mtwatch()-->unregister_all_mtwatch() in
>> [PATCH 2/6] xenbus: implement the xenwatch multithreading framework.
>>
>> To traverse a hash table is not as efficient as traversing a list. That's why a
>> domain is kept on both the hash table and list.
>
>
> Keeping the same object on two lists also has costs. More importantly
> IMO, it increases chances on introducing a bug when people update one
> instance but not the other.
>
>
>>
>>>> + *
>>>> + * While there may exist two mtwatch_domain with the same domid on
>>>> + * domain_list simultaneously,
>>>
>>> How is it possible to have two domids on the list at the same time? Don't you
>>> want to remove it (which IIUIC means moving it to the purge list) when domain is
>>> destroyed?
>> Here is one case (suppose the domid/frontend-id is 9):
>>
>> 1. Suppose the last pv driver device is removed from domid=9, and therefore the
>> reference count of per-domU xenwatch thread for domid=9 (which we call as old
>> thread below) should be removed. We remove it from hash table (it is left in the
>> list).
>>
>> Here we cannot remove the domain from the list immediately because there might
>> be pending events being processed by the corresponding per-domU xenwatch thread.
>> If we remove it from the list while there is related watch being unregistered as
>> mentioned for last question, we may hit page fault when processing watch event.
>
>
> Don't you need to grab domain->domain_mutex to remove the driver?
> Meaning that events for that mtwatch thread cannot be processed?

I think the domain_mutex is not required. Most domain_mutex related code are
copied from xenwatch_thread().

The prerequisite to remove driver is that all pv backend devices belong to this
driver are removed. Once the pv backend device is removed, I think we can assume
there should be no xenwatch event relying on such pv backend device anymore.

>
> In any case, I think that having two mtwatch_domains for the same domain
> should be avoided. (and if you keep the hash list only then this issue
> gets resolved automatically ;-))


So far we have: (1) domain hash table, (2) domain list (where duplicate entries
may exist) and (3) purge list.

Can I assume you would like to discard the domain list and only keep domain hash
table and purge list?

The purpose of the domain list is to facilitate the unregister_xenbus_watch() to
scan the pending events for all otherend id. Should we remove it? Xen hypervisor
used both a hash table and a list to manage struct domain.


About the duplicate entries in the domain list, can we change the flow like below:

1. Suppose the thread status is DOWN. To avoid having duplicate entries on the
domain list, instead of keeping the deprecated thread on domain list (until all
its events get processed), we move it to the purge list immediately.

We will tag this deprecated thread so that the purge list would not purge it
unless all events belong to such thread get processed. We cannot purge it
immediately because the worker thread (to purge the purge list) would hang if
the deprecated thread is already stuck.

In this flow, we may have duplicate entries on purge list, but not domain list.

2. During unregister_xenbus_watch(), we need to scan both the domain list and
purge list to remove pending events for the watch. In previous design, we only
scan domain lit.


One option is we allow the deprecated thread to resurrect and we would not move
the thread to purge list immediately when the thread is deprecated.

Suppose when thread for domid=9 is needed, we would not create new one if the
deprecated one for domid=9 is still in domain list. Instead, we would resurrect
it, change its status and reuse it again. In this way, we would not have
duplicate entries on the domain list.

I like the 1st option. I do not like to resurrect a deprecated thread again.
Would you please let me know how you think about it?

Thank you very much!

Dongli Zhang




>
>
> -boris
>
>
>>
>> 2. Now the administrator attaches new pv device to domid=9 immediately and
>> therefore reference count is initially set to 1. The per-domU xenwatch thread
>> for domid=9 (called new thread) is created again. It is inserted to both the
>> hash table and list.
>>
>> 3. As the old thread for domid=9 might still be on the list, we would have two
>> threads for domid=9 (one old one to be removed and one newly inserted one to be
>> used by new pv devices).
>>
>> Dongli Zhang
>>
>>>
>>> -boris
>>>
>>>
>>>> + * all mtwatch_domain on hash_hash
>>>> + * should have unique domid.
>>>> + */
>>>> + spinlock_t domain_lock;
>>>> + struct hlist_head domain_hash[MTWATCH_HASH_SIZE];
>>>> + struct list_head domain_list;
>>>> +
>>>> + /*
>>>> + * When a per-domU kthread is going to be destroyed, it is put
>>>> + * on the purge_list, and will be flushed by purge_work later.
>>>> + */
>>>> + struct work_struct purge_work;
>>>> + spinlock_t purge_lock;
>>>> + struct list_head purge_list;
>>>> +};
>>>
>