Re: [linus:master] [mm/ksm] 5e924ff54d: ltp.ksm01.fail

From: Stefan Roesch
Date: Wed Nov 22 2023 - 12:20:02 EST



Stefan Roesch <shr@xxxxxxxxxxxx> writes:

> David Hildenbrand <david@xxxxxxxxxx> writes:
>
>> On 16.11.23 05:39, kernel test robot wrote:
>>> hi, Stefan Roesch,
>>> we reported
>>> "[linux-next:master] [mm/ksm] 5e924ff54d: ltp.ksm01_1.fail"
>>> in
>>> https://lore.kernel.org/all/202311031548.66780ff5-oliver.sang@xxxxxxxxx/
>>> when this commit is in linux-next/master.
>>> now we noticed this commit is merged in mainline, and we still observed
>>> same issue. just FYI.
>>> Hello,
>>> kernel test robot noticed "ltp.ksm01.fail" on:
>>> commit: 5e924ff54d088828794d9f1a4d5bf17808f7270e ("mm/ksm: add "smart" page
>>> scanning mode")
>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>> [test failed on linus/master 3ca112b71f35dd5d99fc4571a56b5fc6f0c15814]
>>> [test failed on linux-next/master 8728c14129df7a6e29188a2e737b4774fb200953]
>>> in testcase: ltp
>>> version: ltp-x86_64-14c1f76-1_20230715
>>> with following parameters:
>>> disk: 1HDD
>>> test: mm-00/ksm01
>>> compiler: gcc-12
>>> test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz (Kaby Lake) with 32G memory
>>> (please refer to attached dmesg/kmsg for entire log/backtrace)
>>> If you fix the issue in a separate patch/commit (i.e. not just a new version
>>> of
>>> the same patch/commit), kindly add following tags
>>> | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
>>> | Closes: https://lore.kernel.org/oe-lkp/202311161132.13d8ce5a-oliver.sang@xxxxxxxxx
>>> Running tests.......
>>> <<<test_start>>>
>>> tag=ksm01 stime=1699563923
>>> cmdline="ksm01"
>>> contacts=""
>>> analysis=exit
>>> <<<test_output>>>
>>> tst_kconfig.c:87: TINFO: Parsing kernel config '/proc/config.gz'
>>> tst_test.c:1558: TINFO: Timeout per run is 0h 00m 30s
>>> mem.c:422: TINFO: wait for all children to stop.
>>> mem.c:388: TINFO: child 0 stops.
>>> mem.c:388: TINFO: child 1 stops.
>>> mem.c:388: TINFO: child 2 stops.
>>> mem.c:495: TINFO: KSM merging...
>>> mem.c:434: TINFO: resume all children.
>>> mem.c:422: TINFO: wait for all children to stop.
>>> mem.c:344: TINFO: child 1 continues...
>>> mem.c:347: TINFO: child 1 allocates 128 MB filled with 'a'
>>> mem.c:344: TINFO: child 2 continues...
>>> mem.c:347: TINFO: child 2 allocates 128 MB filled with 'a'
>>> mem.c:344: TINFO: child 0 continues...
>>> mem.c:347: TINFO: child 0 allocates 128 MB filled with 'c'
>>> mem.c:400: TINFO: child 1 stops.
>>> mem.c:400: TINFO: child 0 stops.
>>> mem.c:400: TINFO: child 2 stops.
>>> ksm_helper.c:36: TINFO: ksm daemon takes 2s to run two full scans
>>> mem.c:264: TINFO: check!
>>> mem.c:255: TPASS: run is 1.
>>> mem.c:255: TPASS: pages_shared is 2.
>>> ....
>>> mem.c:255: TPASS: pages_shared is 1.
>>> mem.c:255: TPASS: pages_sharing is 98302.
>>> mem.c:252: TFAIL: pages_volatile is not 0 but 1. <-----
>>> mem.c:252: TFAIL: pages_unshared is not 1 but 0. <-----
>>
>> @Stefan, is this simply related to the new scanning optimization (skip and
>> eventually not merge a pages within the "2 scans" windows, whereby previously,
>> they would have gotten merged)?
>>
>> If so, we might just want to disable that optimization for that test case?
>>
>> Alternatively, maybe we have to wait for "more" scan cycles instead of only 2?
>
> I'd expect this is caused by "smart scan", where we can skip pages.
> The best is probably to disable the smart scan feature for this test.
> The smart scan feature can be disabled by:
>
> echo 0 > /sys/kernel/mm/ksm/smart_scan
>
> I'll have a look at it today.
>

If I disable "smart scan", the testcase completes successully. This is
simply the case that for the testcase, it can "skip" a page (as the
"smart scan" feature is enabled).

The easiest fix is to disable smart scan for the ksm cases. I'll send an
ltp patch a bit later to address this issue.