Re: [PATCH 3/5] mm/vmalloc.c: correct lazy_max_pages() return value

From: zijun_hu
Date: Fri Sep 23 2016 - 01:00:50 EST


On 2016/9/23 11:30, Nicholas Piggin wrote:
> On Fri, 23 Sep 2016 00:30:20 +0800
> zijun_hu <zijun_hu@xxxxxxxx> wrote:
>
>> On 2016/9/22 20:37, Michal Hocko wrote:
>>> On Thu 22-09-16 09:13:50, zijun_hu wrote:
>>>> On 09/22/2016 08:35 AM, David Rientjes wrote:
>>> [...]
>>>>> The intent is as it is implemented; with your change, lazy_max_pages() is
>>>>> potentially increased depending on the number of online cpus. This is
>>>>> only a heuristic, changing it would need justification on why the new
>>>>> value is better. It is opposite to what the comment says: "to be
>>>>> conservative and not introduce a big latency on huge systems, so go with
>>>>> a less aggressive log scale." NACK to the patch.
>>>>>
>>>> my change potentially make lazy_max_pages() decreased not increased, i seems
>>>> conform with the comment
>>>>
>>>> if the number of online CPUs is not power of 2, both have no any difference
>>>> otherwise, my change remain power of 2 value, and the original code rounds up
>>>> to next power of 2 value, for instance
>>>>
>>>> my change : (32, 64] -> 64
>>>> 32 -> 32, 64 -> 64
>>>> the original code: [32, 63) -> 64
>>>> 32 -> 64, 64 -> 128
>>>
>>> You still completely failed to explain _why_ this is an improvement/fix
>>> or why it matters. This all should be in the changelog.
>>>
>>
>> Hi npiggin,
>> could you give some comments for this patch since lazy_max_pages() is introduced
>> by you
>>
>> my patch is based on the difference between fls() and get_count_order() mainly
>> the difference between fls() and get_count_order() will be shown below
>> more MM experts maybe help to decide which is more suitable
>>
>> if parameter > 1, both have different return value only when parameter is
>> power of two, for example
>>
>> fls(32) = 6 VS get_count_order(32) = 5
>> fls(33) = 6 VS get_count_order(33) = 6
>> fls(63) = 6 VS get_count_order(63) = 6
>> fls(64) = 7 VS get_count_order(64) = 6
>>
>> @@ -594,7 +594,9 @@ static unsigned long lazy_max_pages(void)
>> {
>> unsigned int log;
>>
>> - log = fls(num_online_cpus());
>> + log = num_online_cpus();
>> + if (log > 1)
>> + log = (unsigned int)get_count_order(log);
>>
>> return log * (32UL * 1024 * 1024 / PAGE_SIZE);
>> }
>>
>
> To be honest, I don't think I chose it with a lot of analysis.
> It will depend on the kernel usage patterns, the arch code,
> and the CPU microarchitecture, all of which would have changed
> significantly.
>
> I wouldn't bother changing it unless you do some bench marking
> on different system sizes to see where the best performance is.
> (If performance is equal, fewer lazy pages would be better.)
>
> Good to see you taking a look at this vmalloc stuff. Don't be
> discouraged if you run into some dead ends.
>
> Thanks,
> Nick
>
thanks for your reply
please don't pay attention to this patch any more since i don't have
condition to do many test and comparison

i just feel my change maybe be consistent with operation of rounding up
to power of 2