Re: [Xen-devel] [PATCH] xen-netfront: set real_num_tx_queues to zreo avoid to trigger BUG_ON

From: David Vrabel
Date: Mon Feb 22 2016 - 08:34:29 EST


On 20/02/16 06:00, Gonglei (Arei) wrote:
> Hi,
>
> Thanks for rapid feedback :)
>
>> From: David Miller [mailto:davem@xxxxxxxxxxxxx]
>> Sent: Saturday, February 20, 2016 12:37 PM
>>
>> From: Gonglei <arei.gonglei@xxxxxxxxxx>
>> Date: Sat, 20 Feb 2016 09:27:26 +0800
>>
>>> It's possible for a race condition to exist between xennet_open() and
>>> talk_to_netback(). After invoking netfront_probe() then other
>>> threads or processes invoke xennet_open (such as NetworkManager)
>>> immediately may trigger BUG_ON(). Besides, we also should reset
>>> real_num_tx_queues in xennet_destroy_queues().
>>
>> One should really never invoke register_netdev() until the device is
>> %100 fully initialized.
>>
>> This means you cannot call register_netdev() until it is completely
>> legal to invoke your ->open() method.
>>
>> And I think that is what the real problem is here.
>>
>> If you follow the correct rules for ordering wrt. register_netdev()
>> there are no "races". Because ->open() must be legally invokable
>> from the exact moment you call register_netdev().
>>
>
> Yes, I agree. Though that's the historic legacy problem. ;)
>
>> I'm not applying this, as it really sounds like the fundamental issue
>> is the order in which the xen-netfront private data is initialized
>> or setup before being registered.
>
> That means register_netdev() should be invoked after xennet_connect(), right?

No. This would mean that the network device is removed and re-added
when a guest is migrated which at best would result in considerably more
downtime (e.g., the IP address has to be renegotiated with DHCP).

David