Re: [PATCH 1/2] Drivers: hv: hv_balloon: report offline pages as being used

From: Vitaly Kuznetsov
Date: Wed Feb 25 2015 - 11:56:03 EST


KY Srinivasan <kys@xxxxxxxxxxxxx> writes:

>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx]
>> Sent: Thursday, February 19, 2015 8:27 AM
>> To: KY Srinivasan; devel@xxxxxxxxxxxxxxxxxxxxxx
>> Cc: Haiyang Zhang; linux-kernel@xxxxxxxxxxxxxxx; Dexuan Cui
>> Subject: [PATCH 1/2] Drivers: hv: hv_balloon: report offline pages as being
>> used
>>
>> When hot-added memory pages are not brought online or when some
>> memory blocks
>> are sent offline the subsequent ballooning process kills the guest with OOM
>> killer. This happens as we don't report these pages as neither used nor free
>> and apparently host algorythm considers them as being unused. Keep track
>> of
>> all online/offline operations and report all currently offline pages as being
>> used so host won't try to balloon them out.
>>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
>> ---
>> drivers/hv/hv_balloon.c | 33 ++++++++++++++++++++++++---------
>> 1 file changed, 24 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
>> index a095b70..e4b4454 100644
>> --- a/drivers/hv/hv_balloon.c
>> +++ b/drivers/hv/hv_balloon.c
>> @@ -503,6 +503,8 @@ struct hv_dynmem_device {
>> * Number of pages we have currently ballooned out.
>> */
>> unsigned int num_pages_ballooned;
>> + unsigned int num_pages_onlined;
>> + unsigned int num_pages_added;
>>
>> /*
>> * State to manage the ballooning (up) operation.
>> @@ -556,12 +558,15 @@ static void post_status(struct hv_dynmem_device
>> *dm);
>> static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
>> void *v)
>> {
>> + struct memory_notify *mem = (struct memory_notify *)v;
>> +
>> switch (val) {
>> case MEM_GOING_ONLINE:
>> mutex_lock(&dm_device.ha_region_mutex);
>> break;
>>
>> case MEM_ONLINE:
>> + dm_device.num_pages_onlined += mem->nr_pages;
>> case MEM_CANCEL_ONLINE:
>
> Why are we not adjusting num_pages_onlined when we cancel the online
> Operation.

Because we didn't increase the number yet.

To my understanding, events come in the following order:
1) MEM_GOING_ONLINE - we just take the lock
2) MEM_ONLINE - and we increase nr_pages and drop the lock
or
MEM_CANCEL_ONLINE - we just drop the lock (mem never was online so
nr_pages wasn't increased)
3) MEM_GOING_OFFLINE - we do nothing
4) MEM_OFFLINE - and we decrease nr_pages
or
MEM_CANCEL_OFFLINE - we do nothing (mem is still online, no need to
adjust nr_pages)

>
> K. Y

--
Vitaly
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/