Re: [PATCH] numa: fix /proc/<pid>/numa_maps on s390

From: Gerald Schaefer
Date: Thu Jan 21 2016 - 09:57:45 EST


On Wed, 20 Jan 2016 12:26:31 -0800
Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Thu, 21 Jan 2016 03:32:41 +0800 kbuild test robot <lkp@xxxxxxxxx> wrote:
>
> > Hi Michael,
> >
> > [auto build test ERROR on next-20160120]
> > [cannot apply to v4.4-rc8 v4.4-rc7 v4.4-rc6 v4.4]
> > [if your patch is applied to the wrong git tree, please drop us a note to help improving the system]
> >
> > url: https://github.com/0day-ci/linux/commits/Michael-Holzheu/numa-fix-proc-pid-numa_maps-on-s390/20160121-022313
> > config: ia64-alldefconfig (attached as .config)
> > reproduce:
> > wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
> > chmod +x ~/bin/make.cross
> > # save the attached .config to linux build tree
> > make.cross ARCH=ia64
> >
> > All errors (new ones prefixed by >>):
> >
> > fs/proc/task_mmu.c: In function 'gather_pte_stats':
> > >> fs/proc/task_mmu.c:1523:3: error: implicit declaration of function 'huge_ptep_get' [-Werror=implicit-function-declaration]
> > pte_t huge_pte = huge_ptep_get((pte_t *)pmd);
> > ^
>
> hm.
>
> That whole code block in gather_pte_stats() just shouldn't exist when
> CONFIG_TRANSPARENT_HUGEPAGE=n. And because pmd_trans_huge_lock()
> returns NULL, the compiler will nuke it all anyway.
>
> So instead of doing the obvious:
>
> --- a/fs/proc/task_mmu.c~numa-fix-proc-pid-numa_maps-on-s390-fix
> +++ a/fs/proc/task_mmu.c
> @@ -1524,7 +1524,11 @@ static int gather_pte_stats(pmd_t *pmd,
>
> ptl = pmd_trans_huge_lock(pmd, vma);
> if (ptl) {
> +#ifdef CONFIG_HUGETLB_PAGE
> pte_t huge_pte = huge_ptep_get((pte_t *)pmd);
> +#else
> + pte_t huge_pte = *(pte_t *)pmd;
> +#endif
> struct page *page;
>
> page = can_gather_numa_stats(huge_pte, vma, addr);
>
> I'm thinking we should do
>
> --- a/fs/proc/task_mmu.c~numa-fix-proc-pid-numa_maps-on-s390-fix
> +++ a/fs/proc/task_mmu.c
> @@ -1523,6 +1523,7 @@ static int gather_pte_stats(pmd_t *pmd,
> pte_t *pte;
>
> ptl = pmd_trans_huge_lock(pmd, vma);
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> if (ptl) {
> pte_t huge_pte = huge_ptep_get((pte_t *)pmd);
> struct page *page;
> @@ -1534,6 +1535,7 @@ static int gather_pte_stats(pmd_t *pmd,
> spin_unlock(ptl);
> return 0;
> }
> +#endif
>
> if (pmd_trans_unstable(pmd))
> return 0;

Hi Andrew,

Unfortunately this seems to be a lot more complicated than we thought.
huge_ptep_get() is only defined when CONFIG_HUGETLB_PAGE=y, independent
from CONFIG_TRANSPARENT_HUGEPAGE. This is because asm/hugetlb.h is only
included from linux/hugetlb.h when CONFIG_HUGETLB_PAGE=y.

So this fix won't fix the build error when CONFIG_HUGETLBFS=n and
CONFIG_TRANSPARENT_HUGEPAGE=y.

Since the THP code did not repeat the flaws of the hugtelbfs code, i.e.
it is actually working on PMD entries and not PTE entries, there was
no need for huge_ptep_get() for THP so far.

Now it seems that the THP code in gather_pte_stats() is an exception to
this, as it is not working on a PMD like the rest of the THP code, but
also on a fake "PTE" like the hugetlbfs code.

I guess this needs more thinking, two options are crossing my mind:
- Fix the THP code in gather_pte_stats() to properly use a PMD instead of
PTE. This would probably require something like a "_pmd" version of
"can_gather_numa_stats()" and a pmd_dirty() check for the
gather_stats() parameter.
- Make huge_ptep_get() also available for CONFIG_HUGETLBFS=n, perhaps
by introducing something like HAVE_ARCH_HUGE_PTEP_GET and implementing
the default NOP version in linux/hugetlbfs.h instead of the individual
asm/hugetlbfs.h files for all archs.

The first option seems more correct, but it might entail other problems.
The second option would also introduce new problems on s390, where the
implementation of huge_ptep_get() in arch/s390/mm/hugetlbpage.c is currently
only built with CONFIG_HUGETLBFS=y, but I guess we could handle that.

Any thoughts / more ideas?

Regards,
Gerald