Re: [PATCH] mm: memory_hotplug: put migration failure information under DEBUG_VM

From: Charan Teja Kalla
Date: Wed Nov 25 2020 - 06:11:03 EST


Thanks Vlastimil!

On 11/24/2020 7:09 PM, Vlastimil Babka wrote:
> On 11/23/20 4:10 PM, Charan Teja Kalla wrote:
>>
>> Thanks Michal!
>> On 11/23/2020 7:43 PM, Michal Hocko wrote:
>>> On Mon 23-11-20 19:33:16, Charan Teja Reddy wrote:
>>>> When the pages are failed to get isolate or migrate, the page owner
>>>> information along with page info is dumped. If there are continuous
>>>> failures in migration(say page is pinned) or isolation, the log buffer
>>>> is simply getting flooded with the page owner information. As most of
>>>> the times page info is sufficient to know the causes for failures of
>>>> migration or isolation, place the page owner information under
>>>> DEBUG_VM.
>>>
>>> I do not see why this path is any different from others that call
>>> dump_page. Page owner can add a very valuable information to debug
>>> the underlying reasons for failures here. It is an opt-in debugging
>>> feature which needs to be enabled explicitly. So I would argue users
>>> are ready to accept a lot of data in the kernel log.
>>
>> Just thinking how frequently failures can happen in those paths. In the
>> memory hotplug path, we can flood the page owner logs just by making one
>> page pinned. Say If it is anonymous page, the page owner information
>
> So you say it's flooded when page_owner info is included, but not
> flooded when only the rest of __dump_page() is printed? (which is also
> not just one or two lines). That has to be very specific rate of failures.
> Anyway I don't like the solution with arbitrary config option. To
> prevent flooding we generally have ratelimiting, how about that?
>

I can still say the logs are flooded with just the __dump_page() but
they are lot lesser compare to dump_page_owner. The lines are something
like below:
page:ffffffff0b070b80 refcount:3 mapcount:1 mapping:ffffff80faf118e9
index:0xc0903
anon flags:
0x800000000008042c(uptodate|dirty|active|owner_priv_1|swapbacked)
raw: 800000000008042c ffffffc047483a58 ffffffc047483a58 ffffff80faf118e9
raw: 00000000000c0903 00000000000985eb 0000000300000000 ffffff800b5f3000
page dumped because: migration failure
page->mem_cgroup:ffffff800b5f3000
page_owner tracks the page as allocated

Rate limiting the logs, both from __dump_page and dump_page_owner,
looking nice. If it is okay for both of you and Michal, I can raise the
V2 here.

> Also agreed with Michal that page_owner is explicitly enabled debugging
> option and if you use it in production, that's rather surprising to me,
> and possibly more rare than DEBUG_VM, which IIRC Fedora kernels use.

We just enable it on the internal debug systems but never on the
production kernels.

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project