Re: [PATCH] net: Convert net_mutex into rw_semaphore and down read it on net->init/->exit

From: Kirill Tkhai
Date: Fri Nov 17 2017 - 15:51:43 EST


On 17.11.2017 21:52, Eric W. Biederman wrote:
> Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> writes:
>
>> On 15.11.2017 19:31, Eric W. Biederman wrote:
>>> Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> writes:
>>>
>>>> On 15.11.2017 12:51, Kirill Tkhai wrote:
>>>>> On 15.11.2017 06:19, Eric W. Biederman wrote:
>>>>>> Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> writes:
>>>>>>
>>>>>>> On 14.11.2017 21:39, Cong Wang wrote:
>>>>>>>> On Tue, Nov 14, 2017 at 5:53 AM, Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> wrote:
>>>>>>>>> @@ -406,7 +406,7 @@ struct net *copy_net_ns(unsigned long flags,
>>>>>>>>>
>>>>>>>>> get_user_ns(user_ns);
>>>>>>>>>
>>>>>>>>> - rv = mutex_lock_killable(&net_mutex);
>>>>>>>>> + rv = down_read_killable(&net_sem);
>>>>>>>>> if (rv < 0) {
>>>>>>>>> net_free(net);
>>>>>>>>> dec_net_namespaces(ucounts);
>>>>>>>>> @@ -421,7 +421,7 @@ struct net *copy_net_ns(unsigned long flags,
>>>>>>>>> list_add_tail_rcu(&net->list, &net_namespace_list);
>>>>>>>>> rtnl_unlock();
>>>>>>>>> }
>>>>>>>>> - mutex_unlock(&net_mutex);
>>>>>>>>> + up_read(&net_sem);
>>>>>>>>> if (rv < 0) {
>>>>>>>>> dec_net_namespaces(ucounts);
>>>>>>>>> put_user_ns(user_ns);
>>>>>>>>> @@ -446,7 +446,7 @@ static void cleanup_net(struct work_struct *work)
>>>>>>>>> list_replace_init(&cleanup_list, &net_kill_list);
>>>>>>>>> spin_unlock_irq(&cleanup_list_lock);
>>>>>>>>>
>>>>>>>>> - mutex_lock(&net_mutex);
>>>>>>>>> + down_read(&net_sem);
>>>>>>>>>
>>>>>>>>> /* Don't let anyone else find us. */
>>>>>>>>> rtnl_lock();
>>>>>>>>> @@ -486,7 +486,7 @@ static void cleanup_net(struct work_struct *work)
>>>>>>>>> list_for_each_entry_reverse(ops, &pernet_list, list)
>>>>>>>>> ops_free_list(ops, &net_exit_list);
>>>>>>>>>
>>>>>>>>> - mutex_unlock(&net_mutex);
>>>>>>>>> + up_read(&net_sem);
>>>>>>>>
>>>>>>>> After your patch setup_net() could run concurrently with cleanup_net(),
>>>>>>>> given that ops_exit_list() is called on error path of setup_net() too,
>>>>>>>> it means ops->exit() now could run concurrently if it doesn't have its
>>>>>>>> own lock. Not sure if this breaks any existing user.
>>>>>>>
>>>>>>> Yes, there will be possible concurrent ops->init() for a net namespace,
>>>>>>> and ops->exit() for another one. I hadn't found pernet operations, which
>>>>>>> have a problem with that. If they exist, they are hidden and not clear seen.
>>>>>>> The pernet operations in general do not touch someone else's memory.
>>>>>>> If suddenly there is one, KASAN should show it after a while.
>>>>>>
>>>>>> Certainly the use of hash tables shared between multiple network
>>>>>> namespaces would count. I don't rembmer how many of these we have but
>>>>>> there used to be quite a few.
>>>>>
>>>>> Could you please provide an example of hash tables, you mean?
>>>>
>>>> Ah, I see, it's dccp_hashinfo etc.
>>
>> JFI, I've checked dccp_hashinfo, and it seems to be safe.
>>
>>>
>>> The big one used to be the route cache. With resizable hash tables
>>> things may be getting better in that regard.
>>
>> I've checked some fib-related things, and wasn't able to find that.
>> Excuse me, could you please clarify, if it's an assumption, or
>> there is exactly a problem hash table, you know? Could you please
>> point it me more exactly, if it's so.
>
> Two things.
> 1) Hash tables are one case I know where we access data from multiple
> network namespaces. As such it can not be asserted that is no
> possibility for problems.
>
> 2) The responsible way to handle this is one patch for each set of
> methods explaining why those methods are safe to run in parallel.
>
> That ensures there is opportunity for review and people are going
> slowly enough that they actually look at these issues.
>
> The reason I want to see this broken up is that at 200ish sets of
> methods it is too much to review all at once.

Ok, it's possible to split the changes in 400 patches, but there is
a problem with three-state (no compile, module, built-in) drivers.
Git bisect won't work anyway. Please see the description of the problem
in cover message "[PATCH RFC 00/25] Replacing net_mutex with rw_semaphore"
I sent today.

> I completely agree that odds are that this can be made safe and that it
> is mostly likely already safe in practically every instance. My guess
> would be that if there are problems that need to be addressed they
> happen in one or two places and we need to find them. If possible I
> don't want to find them after the code has shipped in a stable release.

Kirill