Re: [PATCH] virtio_net: enable tx after resuming from suspend

From: ake
Date: Fri Oct 12 2018 - 05:18:08 EST




On 2018å10æ12æ 17:23, Jason Wang wrote:
>
>
> On 2018å10æ12æ 12:30, ake wrote:
>>
>> On 2018å10æ11æ 22:06, Jason Wang wrote:
>>>
>>> On 2018å10æ11æ 18:22, ake wrote:
>>>> On 2018å10æ11æ 18:44, Jason Wang wrote:
>>>>> On 2018å10æ11æ 15:51, Ake Koomsin wrote:
>>>>>> commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
>>>>>> disabled the virtio tx before going to suspend to avoid a use after
>>>>>> free.
>>>>>> However, after resuming, it causes the virtio_net device to lose its
>>>>>> network connectivity.
>>>>>>
>>>>>> To solve the issue, we need to enable tx after resuming.
>>>>>>
>>>>>> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during
>>>>>> reset")
>>>>>> Signed-off-by: Ake Koomsin <ake@xxxxxxxxxx>
>>>>>> ---
>>>>>> ÂÂÂ drivers/net/virtio_net.c | 1 +
>>>>>> ÂÂÂ 1 file changed, 1 insertion(+)
>>>>>>
>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>> index dab504ec5e50..3453d80f5f81 100644
>>>>>> --- a/drivers/net/virtio_net.c
>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>> @@ -2256,6 +2256,7 @@ static int virtnet_restore_up(struct
>>>>>> virtio_device *vdev)
>>>>>> ÂÂÂÂÂÂÂ }
>>>>>> ÂÂÂ ÂÂÂÂÂ netif_device_attach(vi->dev);
>>>>>> +ÂÂÂ netif_start_queue(vi->dev);
>>>>> I believe this is duplicated with netif_tx_wake_all_queues() in
>>>>> netif_device_attach() above?
>>>> Thank you for your review.
>>>>
>>>> If both netif_tx_wake_all_queues() and netif_start_queue() result in
>>>> clearing __QUEUE_STATE_DRV_XOFF, then is it possible that some
>>>> conditions in netif_device_attach() is not satisfied?
>>> Yes, maybe. One case I can see now is when the device is down, in this
>>> case netif_device_attach() won't try to wakeup the queue.
>>>
>>>> ÂÂ Without
>>>> netif_start_queue(), the virtio_net device does not resume properly
>>>> after waking up.
>>> How do you trigger the issue? Just do suspend/resume?
>> Yes, simply suspend and resume.
>>
>> Here is how I trigger the issue:
>>
>> 1) Start the Virtual Machine Manager GUI program.
>> 2) Create a guest Linux OS. Make sure that the guest OS kernel is
>> ÂÂÂ >= 4.12. Make sure that it uses virtio_net as its network device.
>> ÂÂÂ In addition, make sure that the video adapter is VGA. Otherwise,
>> ÂÂÂ waking up with the virtual power button does not work.
>> 3) After installing the guest OS, log in, and test the network
>> ÂÂÂ connectivity by ping the host machine.
>> 4) Suspend. After this, the screen is blank.
>> 5) Resume by hitting the virtual power button. The login screen
>> ÂÂÂ appears again.
>> 6) Log in again. The guest loses its network connection.
>>
>> In my test:
>> Guest: Ubuntu 16.04/18.04 with kernel 4.15.0-36-generic
>> Host: Ubuntu 16.04 with kernel 4.15.0-36-generic/4.4.0-137-generic
>
> I can not reproduce this issue if virtio-net interface is up in guest
> before the suspend. I'm using net-next.git and qemu master. But I do
> reproduce when virtio-net interface is down in guest before suspend,
> after resume, even if I make it up, the network is still lost.
>
> I think the interface is up in your case, but please confirm this.

If you mean the interface state before I hit the suspend button,
the answer is yes. The interface is up before I suspend the guest
machine.

Note that my current QEMU version is QEMU emulator version 2.5.0
(Debian 1:2.5+dfsg-5ubuntu10.32).

I will try with net-next.git and qemu master later and see if I can
reproduce the issue.

>>
>>>> Is it better to report this as a bug first?
>>> Nope, you're very welcome to post patch directly.
>>>
>>>> If I am to do more
>>>> investigation, what areas should I look into?
>>> As you've figured out, you can start with why netif_tx_wake_all_queues()
>>> were not executed?
>>>
>>> (Btw, does the issue disappear if you move netif_tx_disable() under the
>>> check of netif_running() in virtnet_freeze_down()?)
>> The issue disappears if I move netif_tx_disable() under the check of
>> netif_running() in virtnet_freeze_down(). Moving netif_tx_disable()
>> is probably better as its logic is consistent with
>> netif_device_attach() implementation. If you are OK with this idea,
>> I will submit another patch.
>
> I think the it helps for the case when interface is down before suspend.
> But it's still unclear why it help even if the interface is up
> (netif_running() is true).
>
> Please submit a patch but we should figure out why it help for a up
> interface as well.
>
> Thanks
>
>>
>>> Thanks
>>>
>>>> Best Regards
>>>> Ake Koomsin
>>>>
>> Best Regards
>