Re: [PATCH 2/3] hugetlbfs: close race between MADV_DONTNEED and page fault

From: Mike Kravetz
Date: Tue Oct 03 2023 - 16:20:08 EST


On 10/03/23 15:35, Rik van Riel wrote:
> On Sun, 2023-10-01 at 21:39 -0700, Mike Kravetz wrote:
> >
> > Something is not right here.  I have not looked closely at the patch,
> > but running libhugetlbfs test suite hits this NULL deref in misalign
> > (2M: 32).
>
> Hi Mike,
>
> fixing the null dereference was easy, but I continued running
> into a test case failure with linkhuge_rw. After tweaking the
> code in my patches quite a few times, I finally ran out of
> ideas and tried it on a tree without my patches.
>
> I still see the test failure on upstream
> 2cf0f7156238 ("Merge tag 'nfs-for-6.6-2' of git://git.linux-
> nfs.org/projects/anna/linux-nfs")
>
> This is with a modern glibc, and the __morecore assignments
> in libhugetlbfs/morecore.c commented out.
>
>
> HUGETLB_ELFMAP=R HUGETLB_SHARE=1 linkhuge_rw (2M: 32): Pool state:
> (('hugepages-2048kB', (('free_hugepages', 1), ('resv_hugepages', 0),
> ('surplus_hugepages', 0), ('nr_hugepages_mempolicy', 1),
> ('nr_hugepages', 1), ('nr_overcommit_hugepages', 0))),)
> Hugepage pool state not preserved!
> BEFORE: (('hugepages-2048kB', (('free_hugepages', 1),
> ('resv_hugepages', 0), ('surplus_hugepages', 0),
> ('nr_hugepages_mempolicy', 1), ('nr_hugepages', 1),
> ('nr_overcommit_hugepages', 0))),)
> AFTER: (('hugepages-2048kB', (('free_hugepages', 0), ('resv_hugepages',
> 0), ('surplus_hugepages', 0), ('nr_hugepages_mempolicy', 1),
> ('nr_hugepages', 1), ('nr_overcommit_hugepages', 0))),)
>

Hi Rik,

When I started working on hugetlb several years ago, the following libhugetlbfs
tests failed. This was/is with a version of glibc that supports __morecore.

noresv-preserve-resv-page (2M: 32): FAIL mmap() 1: Invalid argument
HUGETLB_ELFMAP=RW linkhuge_rw (2M: 32): FAIL small_data is not hugepage
HUGETLB_ELFMAP=RW linkhuge_rw (2M: 64): FAIL small_data is not hugepage
HUGETLB_MINIMAL_COPY=no HUGETLB_ELFMAP=RW linkhuge_rw (2M: 32): FAIL small_data is not hugepage
HUGETLB_MINIMAL_COPY=no HUGETLB_ELFMAP=RW linkhuge_rw (2M: 64): FAIL small_data is not hugepage
HUGETLB_ELFMAP=RW HUGETLB_SHARE=0 linkhuge_rw (2M: 32): FAIL small_data is not hugepage
HUGETLB_ELFMAP=RW HUGETLB_SHARE=0 linkhuge_rw (2M: 64): FAIL small_data is not hugepage
HUGETLB_ELFMAP=RW HUGETLB_SHARE=1 linkhuge_rw (2M: 32): FAIL small_data is not hugepage
HUGETLB_ELFMAP=RW HUGETLB_SHARE=1 linkhuge_rw (2M: 64): FAIL small_data is not hugepage
alloc-instantiate-race shared (2M: 32): FAIL mmap() 1: Cannot allocate memory
alloc-instantiate-race private (2M: 32): FAIL mmap() 1: Cannot allocate memory
truncate_sigbus_versus_oom (2M: 32): FAIL mmap() reserving all pages: Invalid argument
mmap-gettest 10 2048 (2M: 32): FAIL Failed to mmap the hugetlb file: Invalid argument
shm-fork 10 2048 (2M: 32): FAIL shmget(): Invalid argument
shm-getraw 2048 /dev/full (2M: 32): FAIL shmget(): Invalid argument

I spent some time looking into the issues, but most were issues with the
tests themselves. I did not attempt to modify the tests, nor do I
remember all the issues.

Please consider the above failures normal and expected. That have been
this way for many years. Sorry for any waste of your time.

Of course, if you would like to look into these you are welcome.
--
Mike Kravetz