Re: [PATCH] selftests/mm: run_vmtests.sh: add missing tests

From: Muhammad Usama Anjum
Date: Tue Jan 23 2024 - 02:51:25 EST


On 1/22/24 2:59 PM, Ryan Roberts wrote:
>>>> +CATEGORY="hugetlb" run_test ./hugetlb-read-hwpoison
>>>
>>> The addition of this test causes 2 later tests to fail with ENOMEM. I suspect
>>> its a side-effect of marking the hugetlbs as hwpoisoned? (just a guess based on
>>> the test name!). Once a page is marked poisoned, is there a way to un-poison it?
>>> If not, I suspect that's why it wasn't part of the standard test script in the
>>> first place.
>> hugetlb-read-hwpoison failed as probably the fix in the kernel for the test
>> hasn't been merged in the kernel. The other tests (uffd-stress) aren't
>> failing on my end and on CI [1][2]
>
> To be clear, hugetlb-read-hwpoison isn't failing for me, its just causing the
> subsequent tests uffd-stress tests to fail. Both of those subsequent tests are
> allocating hugetlbs so my guess is that since this test is marking some hugetlbs
> as poisoned, there are no longer enough for the subsequent tests.
>
>>
>> [1] https://lava.collabora.dev/scheduler/job/12577207#L3677
>> [2] https://lava.collabora.dev/scheduler/job/12577229#L4027
>>
>> Maybe its configurations issue which is exposed now. Not sure. Maybe
>> hugetlb-read-hwpoison is changing some configuration and not restoring it.
>
> Well yes - its marking some hugetlb pages as HWPOISONED.
>
>> Maybe your system has less number of hugetlb pages.
>
> YEs probably; What is hugetlb-read-hwpoison's requirement for size and number of
> hugetlb pages? the run_vmtests.sh script allocates the required number of
> default-sized hugetlb pages before running any tests (I guess this value should
> be increased for hugetlb-read-hwpoison's requirements?).
>
> Additionally, our CI preallocates non-default sizes from the kernel command line
> at boot. Happy to increase these if you can tell me what the new requirement is:
I'm not sure about the exact requirement of the number of hugetlb for these
tests. But I specify hugepages=1000 and tests work for me.

I've sent v2 [1]. Would it be possible to run your CI on that and share
results before we merge that one?

[1]
https://lore.kernel.org/all/20240123073615.920324-1-usama.anjum@xxxxxxxxxxxxx

>
> hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2
> default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K hugepages=0:2,1:2
>
> Thanks,
> Ryan
>

--
BR,
Muhammad Usama Anjum