Re: [PATCH] mm: migrate: Return false instead of -EAGAIN for dummy functions

From: Chen Gang
Date: Tue Sep 20 2016 - 18:04:43 EST


On 9/20/16 16:09, Michal Hocko wrote:
> On Tue 20-09-16 05:46:58, Chen Gang wrote:
>>
>> For me, it really need return false:
>>
>> - For real implementation, when do nothing, it will return false.
>>
>> - I assume that the input page already is in a node (although maybe my
>> assumption incorrect), and migrate to the same node. When the real
>> implementation fails (e.g. -EAGAIN 10 times), it still returns false.
>>
>> - Original dummy implementation always return -EAGAIN, And -EAGAIN in
>> real implementation will trigger returning false, after 10 times.
>>
>> - After grep TNF_MIGRATE_FAIL and TNF_MIGRATED, we only use them in
>> task_numa_fault in kernel/sched/fair.c for numa_pages_migrated and
>> numa_faults_locality, I guess they are only used for statistics.
>>
>> So for me the dummy implementation need return false instead of -EAGAIN.
>
> I see that the return value semantic might be really confusing. But I am
> not sure why bool would make it all of the sudden any less confusing.
> migrate_page returns -EAGAIN on failure and 0 on success, migrate_pages
> returns -EAGAIN or number of not migrated pages on failure and 0 on
> success. So migrate_misplaced_page doesn't fit into this mode with the
> bool return value. So I would argue that the code is not any better.
>

I guess, numamigrate_isolate_page can be bool, at least.

And yes, commonly, bool functions are for asking something, and int
functions are for doing something, but not must be. When the caller care
about success, but never care about every failure details, bool is OK.

In our case, for me, numa balancing is for performance. When return
failure, the system has no any negative effect -- only lose a chance for
improving performance.

- For user, the failure times statistics is enough, they need not care
about every failure details.

- For tracer, the failure details statistics are meaningfulness, but
focusing on each failure details is meaningless. Now, it finishes a
part of failure details statistics -- which can be improved next.

- For debugger (or printing log), focusing on each failure details is
useful. But debugger already can check every details, returning every
failure details is still a little helpful, but not necessary.

>>
>> If our original implementation already used bool, our this issue (return
>> -EAGAIN) would be avoided (compiler would help us to find this issue).
>
> OK, so you pushed me to look into it deeper and the fact is that
> migrate_misplaced_page return value doesn't matter at all for
> CONFIG_NUMA_BALANCING=n because task_numa_fault is noop for that
> configuration. Moreover the whole do_numa_page should never execute with
> that configuration because we will not have numa pte_protnone() ptes in
> that path. do_huge_pmd_numa_page seems be in a similar case. So this
> doesn't have any real impact on the runtime AFAICS.
>

OK, thanks.

> So what is the point of this whole exercise? Do not take me wrong, this
> area could see some improvements but I believe that doing int->bool
> change is not just the right thing to do and worth spending both your
> and reviewers time.
>

I am not quite sure about that.

Thanks.
--
Chen Gang (éå)

Managing Natural Environments is the Duty of Human Beings.