Re: [PATCH 2/3] hugetlbfs: close race between MADV_DONTNEED and page fault

From: Rik van Riel
Date: Tue Oct 03 2023 - 20:20:34 EST


On Tue, 2023-10-03 at 13:19 -0700, Mike Kravetz wrote:
> On 10/03/23 15:35, Rik van Riel wrote:
> > On Sun, 2023-10-01 at 21:39 -0700, Mike Kravetz wrote:
> > >
> > > Something is not right here.  I have not looked closely at the
> > > patch,
> > > but running libhugetlbfs test suite hits this NULL deref in
> > > misalign
> > > (2M: 32).
> >
> > Hi Mike,
> >
> > fixing the null dereference was easy, but I continued running
> > into a test case failure with linkhuge_rw. After tweaking the
> > code in my patches quite a few times, I finally ran out of
> > ideas and tried it on a tree without my patches.
> >
> > I still see the test failure on upstream
> > 2cf0f7156238 ("Merge tag 'nfs-for-6.6-2' of git://git.linux-
> > nfs.org/projects/anna/linux-nfs")
> >
> > This is with a modern glibc, and the __morecore assignments
> > in libhugetlbfs/morecore.c commented out.
> >
> >
> > HUGETLB_ELFMAP=R HUGETLB_SHARE=1 linkhuge_rw (2M: 32):  Pool state:
> > (('hugepages-2048kB', (('free_hugepages', 1), ('resv_hugepages',
> > 0),
> > ('surplus_hugepages', 0), ('nr_hugepages_mempolicy', 1),
> > ('nr_hugepages', 1), ('nr_overcommit_hugepages', 0))),)
> > Hugepage pool state not preserved!
> > BEFORE: (('hugepages-2048kB', (('free_hugepages', 1),
> > ('resv_hugepages', 0), ('surplus_hugepages', 0),
> > ('nr_hugepages_mempolicy', 1), ('nr_hugepages', 1),
> > ('nr_overcommit_hugepages', 0))),)
> > AFTER: (('hugepages-2048kB', (('free_hugepages', 0),
> > ('resv_hugepages',
> > 0), ('surplus_hugepages', 0), ('nr_hugepages_mempolicy', 1),
> > ('nr_hugepages', 1), ('nr_overcommit_hugepages', 0))),)
> >
>
> Please consider the above failures normal and expected.  That have
> been
> this way for many years.  Sorry for any waste of your time.
>
> Of course, if you would like to look into these you are welcome.

I'm not too worried about the test cases returning failure,
but having free_hugepages not go back to 1 after linkhuge_rw
exits looks bad.

In this case it appears that linkhuge_rw simply left behind
a file in /dev/hugepages when it died, and removing that file
returns free_hugepages back to what it should be.

I guess I'll go run the test cases without -c 1 :)

--
All Rights Reversed.