Re: [Question] Should direct reclaim time be bounded?

From: Mike Kravetz
Date: Fri Jun 28 2019 - 14:23:24 EST


On 4/24/19 7:35 AM, Vlastimil Babka wrote:
> On 4/23/19 6:39 PM, Mike Kravetz wrote:
>>> That being said, I do not think __GFP_RETRY_MAYFAIL is wrong here. It
>>> looks like there is something wrong in the reclaim going on.
>>
>> Ok, I will start digging into that. Just wanted to make sure before I got
>> into it too deep.
>>
>> BTW - This is very easy to reproduce. Just try to allocate more huge pages
>> than will fit into memory. I see this 'reclaim taking forever' behavior on
>> v5.1-rc5-mmotm-2019-04-19-14-53. Looks like it was there in v5.0 as well.
>
> I'd suspect this in should_continue_reclaim():
>
> /* Consider stopping depending on scan and reclaim activity */
> if (sc->gfp_mask & __GFP_RETRY_MAYFAIL) {
> /*
> * For __GFP_RETRY_MAYFAIL allocations, stop reclaiming if the
> * full LRU list has been scanned and we are still failing
> * to reclaim pages. This full LRU scan is potentially
> * expensive but a __GFP_RETRY_MAYFAIL caller really wants to succeed
> */
> if (!nr_reclaimed && !nr_scanned)
> return false;
>
> And that for some reason, nr_scanned never becomes zero. But it's hard
> to figure out through all the layers of functions :/

I got back to looking into the direct reclaim/compaction stalls when
trying to allocate huge pages. As previously mentioned, the code is
looping for a long time in shrink_node(). The routine
should_continue_reclaim() returns true perhaps more often than it should.

As Vlastmil guessed, my debug code output below shows nr_scanned is remaining
non-zero for quite a while. This was on v5.2-rc6.

To help determine what is happening, I added a counter to determine how many
times should_continue_reclaim was returning true within one calling context
from shrink_node. Every 1,000,000 calls, I set a 'debug flag' to get a little
more information about what is happening before the next call. Here is output
from debug code with some comments about what the debug code is showing.

[ 1477.583859] ***should_continue_reclaim: setting debug_nr_scanned call 1000000
[ 1477.584698] shrink_inactive_list: nr_taken 1 = isolate_lru_pages(1, ...

shrink_node calls shrink_node_memcg which calls shrink_list. shrink_list
can increment sc->nr_scanned which is used to compute the value of nr_scanned
passed to should_continue_reclaim. Here, we see that only one page is
isolated for potential reclaim.

[ 1477.585465] shrink_page_list: goto activate_locked 4

shrink_page_list determines that the page can not be reclaimed. The following
code in shrink_page_list determines this.

if (page_has_private(page)) {
if (!try_to_release_page(page, sc->gfp_mask))
{
/* Obviously, my debug code. */
if (sc->debug_nr_scanned)
printk("shrink_page_list: goto activate_locked 4\n");
goto activate_locked;
}

[ 1477.586044] shrink_inactive_list: nr_reclaimed = 0

The bottom line is that page is not reclaimed. But, it was 'scanned' so ..

[ 1477.586627] ***should_continue_reclaim: sc->nr_reclaimed 9 pages_for_compaction 1024
[ 1477.587546] nr_reclaimed 0, nr_scanned 1

should_continue_reclaim is called with nr_reclaimed 0, nr_scanned 1

[ 1477.588124] inactive_lru_pages 1 sc->nr_scanned 1000001

Since sc->nr_scanned is 1000001 for 1000001 calls, it looks like we scanned
one page each time.

Additional similar output without comments.

[ 1511.200515] ***should_continue_reclaim: setting debug_nr_scanned call 2000000
[ 1511.201581] shrink_inactive_list: nr_taken 1 = isolate_lru_pages(1, ...
[ 1511.202615] shrink_page_list: goto activate_locked 4
[ 1511.203561] shrink_inactive_list: nr_reclaimed = 0
[ 1511.204422] ***should_continue_reclaim: sc->nr_reclaimed 10 pages_for_compaction 1024
[ 1511.205785] nr_reclaimed 0, nr_scanned 1
[ 1511.206569] inactive_lru_pages 1 sc->nr_scanned 2000001
[ 1543.982310] ***should_continue_reclaim: setting debug_nr_scanned call 3000000
[ 1543.983692] shrink_inactive_list: nr_taken 1 = isolate_lru_pages(1, ...
[ 1543.984645] shrink_page_list: goto activate_locked 4
[ 1543.985405] shrink_inactive_list: nr_reclaimed = 0
[ 1543.986386] ***should_continue_reclaim: sc->nr_reclaimed 11 pages_for_compaction 1024
[ 1543.987662] nr_reclaimed 0, nr_scanned 1
[ 1543.988423] inactive_lru_pages 1 sc->nr_scanned 3000001
[ 1577.104923] ***should_continue_reclaim: setting debug_nr_scanned call 4000000
[ 1577.105857] shrink_inactive_list: nr_taken 1 = isolate_lru_pages(1, ...
[ 1577.106752] shrink_page_list: goto activate_locked 4
[ 1577.107454] shrink_inactive_list: nr_reclaimed = 0
[ 1577.108163] ***should_continue_reclaim: sc->nr_reclaimed 11 pages_for_compaction 1024
[ 1577.109423] nr_reclaimed 0, nr_scanned 1
[ 1577.110205] inactive_lru_pages 1 sc->nr_scanned 4000001
[ 1614.075236] ***should_continue_reclaim: setting debug_nr_scanned call 5000000
[ 1614.076375] shrink_inactive_list: nr_taken 1 = isolate_lru_pages(1, ...
[ 1614.077410] shrink_page_list: goto activate_locked 4
[ 1614.078210] shrink_inactive_list: nr_reclaimed = 0
[ 1614.078913] ***should_continue_reclaim: sc->nr_reclaimed 13 pages_for_compaction 1024
[ 1614.079907] nr_reclaimed 0, nr_scanned 1
[ 1614.080496] inactive_lru_pages 1 sc->nr_scanned 5000001
[ 1650.360466] ***should_continue_reclaim: setting debug_nr_scanned call 6000000
[ 1650.361342] shrink_inactive_list: nr_taken 1 = isolate_lru_pages(1, ...
[ 1650.362147] shrink_page_list: goto activate_locked 4
[ 1650.362759] shrink_inactive_list: nr_reclaimed = 0
[ 1650.363685] ***should_continue_reclaim: sc->nr_reclaimed 13 pages_for_compaction 1024
[ 1650.364648] nr_reclaimed 0, nr_scanned 1
[ 1650.365222] inactive_lru_pages 0 sc->nr_scanned 6000001
[ 1685.061380] ***should_continue_reclaim: setting debug_nr_scanned call 7000000
[ 1685.062529] shrink_inactive_list: nr_taken 1 = isolate_lru_pages(1, ...
[ 1685.063615] shrink_page_list: goto activate_locked 4
[ 1685.064439] shrink_inactive_list: nr_reclaimed = 0
[ 1685.065244] ***should_continue_reclaim: sc->nr_reclaimed 14 pages_for_compaction 1024
[ 1685.066536] nr_reclaimed 0, nr_scanned 1
[ 1685.067356] inactive_lru_pages 1 sc->nr_scanned 7000001
[ 1719.103176] ***should_continue_reclaim: setting debug_nr_scanned call 8000000
[ 1719.104206] shrink_inactive_list: nr_taken 1 = isolate_lru_pages(1, ...
[ 1719.105117] shrink_page_list: goto activate_locked 4
[ 1719.105781] shrink_inactive_list: nr_reclaimed = 0
[ 1719.106475] ***should_continue_reclaim: sc->nr_reclaimed 15 pages_for_compaction 1024
[ 1719.107499] nr_reclaimed 0, nr_scanned 1
[ 1719.108129] inactive_lru_pages 1 sc->nr_scanned 8000001
[ 1743.911025] ***should_continue_reclaim: false after 8711717 calls !nr_reclaimed && !nr_scanned

Note that we do reclaim a 'few' pages along the way. After 8711717 calls
should_continue_reclaim finally hits the condition where (!nr_reclaimed &&
!nr_scanned) and returns false. We were stuck looping in shrink node for
about 5 minutes.

Any additional insight, suggestions for additional debug, etc. would be
appreciated.
--
Mike Kravetz