Re: [RFC][PATCH] mm/page_isolation: tracing: trace all test_pages_isolated failures

From: David Hildenbrand
Date: Fri Sep 03 2021 - 05:31:11 EST


On 03.09.21 00:21, George G. Davis wrote:
On Tue, Aug 31, 2021 at 04:53:31PM +0200, David Hildenbrand wrote:
On 23.08.21 22:28, George G. Davis wrote:
From: "George G. Davis" <davis.george@xxxxxxxxxxx>

Some test_pages_isolated failure conditions don't include trace points.
For debugging issues caused by "pinned" pages, make sure to trace all
calls whether they succeed or fail. In this case, a failure case did not
result in a trace point. So add the missing failure case in
test_pages_isolated traces.

In which setups did you actually run into these cases?

Good question!

Although I'm not 100% certain that this specific failure condition has
occurred in my recent testing, I'm able to reproduce cma_alloc -EBUSY
faiure conditions when testing latest/recent master on arm64 based
Renesas R-Car Starter Kit [1] using defconfig with
CONFIG_CMA_SIZE_MBYTES=384 while running the following test case:

Okay, I think you are not hitting the path you touched in this patch, because I assume it will never ever really trigger ...


trace-cmd record -N 192.168.1.87:12345 -b 4096 -e cma -e page_isolation -e compaction -e migrate &
sleep 10
while true; do a=$(( ( RANDOM % 10000 ) + 1 )); echo $a > /sys/kernel/debug/cma/cma-reserved/alloc && (usleep $a; echo $a > /sys/kernel/debug/cma/cma-reserved/free); done &
while true; do b=$(( ( RANDOM % 10000 ) + 1 )); echo $b > /sys/kernel/debug/cma/cma-reserved/alloc && (usleep $b; echo $b > /sys/kernel/debug/cma/cma-reserved/free); done &
while true; do c=$(( ( RANDOM % 10000 ) + 1 )); echo $c > /sys/kernel/debug/cma/cma-reserved/alloc && (usleep $c; echo $c > /sys/kernel/debug/cma/cma-reserved/free); done &
while true; do d=$(( ( RANDOM % 10000 ) + 1 )); echo $d > /sys/kernel/debug/cma/cma-reserved/alloc && (usleep $d; echo $d > /sys/kernel/debug/cma/cma-reserved/free); done &
while true; do e=$(( ( RANDOM % 10000 ) + 1 )); echo $e > /sys/kernel/debug/cma/cma-reserved/alloc && (usleep $e; echo $e > /sys/kernel/debug/cma/cma-reserved/free); done &
/selftests/vm/transhuge-stress &

The cma_alloc -EBUSY failures are caused by THP compound pages allocated
from the CMA region where migration does not seem to work for compound
THP pages. The work around is to disable CONFIG_TRANSPARENT_HUGEPAGE
since it seems incompatible with the intended use of the CMA region.


Oh, that sounds broken, THP should not block CMA allocation or page migration for other purposes.

a) Are these temporary or permanent allocation errors? If they are permanent, they will also break memory unplug.

b) Did you reproduce on other architectures as well?

c) Did it use to work but is now broken? IOW, did you try bisecting?

--
Thanks,

David / dhildenb