Re: [PATCH v1 09/29] virtio-mem: don't always trigger the workqueue when offlining memory

From: David Hildenbrand
Date: Fri Oct 16 2020 - 05:18:55 EST


On 16.10.20 06:03, Wei Yang wrote:
> On Mon, Oct 12, 2020 at 02:53:03PM +0200, David Hildenbrand wrote:
>> Let's trigger from offlining code when we're not allowed to touch online
>> memory.
>
> This describes the change in virtio_mem_memory_notifier_cb()?

Ah, yes, can try to make that clearer.

>
>>
>> Handle the other case (memmap possibly freeing up another memory block)
>> when actually removing memory. When removing via virtio_mem_remove(),
>> virtio_mem_retry() is a NOP and safe to use.
>>
>> While at it, move retry handling when offlining out of
>> virtio_mem_notify_offline(), to share it with Device Block Mode (DBM)
>> soon.
>
> I may not understand the logic fully. Here is my understanding of current
> logic:
>
>
> virtio_mem_run_wq()
> virtio_mem_unplug_request()
> virtio_mem_mb_unplug_any_sb_offline()
> virtio_mem_mb_remove() --- 1
> virtio_mem_mb_unplug_any_sb_online()
> virtio_mem_mb_offline_and_remove() --- 2
>
> This patch tries to trigger the wq at 1 and 2. And these two functions are
> only valid during this code flow.

Exactly.

>
> These two functions actually remove some memory from the system. So I am not
> sure where extra unplug-able memory comes from. I guess those memory is from
> memory block device and mem_sectioin, memmap? While those memory is still
> marked as online, right?

Imagine you end up (only after some repeating plugging and unplugging of
memory, otherwise it's obviously impossible):

Memory block X: Contains only movable data

Memory block X + 1: Contains memmap of Memory block X:


We start to unplug from high, to low.

1. Try to unplug/offline/remove block X + 1: fails, because of the
memmap
2. Try to unplug/offline/remove block X: succeeds.
3. Not all requested memory got unplugged. Sleep for 30 seconds.
4. Retry to unplug/offline/remove block X + 1: succeeds

What we do in 2, is that we trigger a retry of ourselves. That means,
that in 3. we don't actually sleep, but retry immediately.

This has been proven helpful in some of my tests, where you want to
unplug *a lot* of memory again, not just some parts.


Triggering a retry is fairly cheap. Assume you don't actually have to
perform any more unplugging. The workqueue wakes up, detects that
nothing is to do, and goes back to sleep.

--
Thanks,

David / dhildenb