Re: [PATCH 23/31] huge tmpfs recovery: framework for reconstituting huge pages

From: Hugh Dickins
Date: Wed Apr 06 2016 - 22:05:51 EST


On Wed, 6 Apr 2016, Mika Penttila wrote:
> On 04/06/2016 12:53 AM, Hugh Dickins wrote:
> > +static void shmem_recovery_work(struct work_struct *work)
...
> > +
> > + if (head) {
> > + /* We are resuming work from a previous partial recovery */
> > + if (PageTeam(page))
> > + shr_stats(resume_teamed);
> > + else
> > + shr_stats(resume_tagged);
> > + } else {
> > + gfp_t gfp = mapping_gfp_mask(mapping);
> > + /*
> > + * XXX: Note that with swapin readahead, page_to_nid(page) will
> > + * often choose an unsuitable NUMA node: something to fix soon,
> > + * but not an immediate blocker.
> > + */
> > + head = __alloc_pages_node(page_to_nid(page),
> > + gfp | __GFP_NOWARN | __GFP_THISNODE, HPAGE_PMD_ORDER);
> > + if (!head) {
> > + shr_stats(huge_failed);
> > + error = -ENOMEM;
> > + goto out;
> > + }
>
> Should this head marked PageTeam? Because in patch 27/31 when given as a hint to shmem_getpage_gfp() :
>
> hugehint = NULL;
> + if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> + sgp == SGP_TEAM && *pagep) {
> + struct page *head;
> +
> + if (!get_page_unless_zero(*pagep)) {
> + error = -ENOENT;
> + goto decused;
> + }
> + page = *pagep;
> + lock_page(page);
> + head = page - (index & (HPAGE_PMD_NR-1));
>
> we fail always because :
> + if (!PageTeam(head)) {
> + error = -ENOENT;
> + goto decused;
> + }

Great observation, thank you Mika.

We don't fail always, because in most cases the page wanted for the head
will either be already in memory, or read in from swap, and that SGP_TEAM
block in shmem_getpage_gfp() (with the -ENOENT you show) not come into play
on it: then shmem_recovery_populate() does its !recovery->exposed_team
SetPageTeam(head) and all is well from then on.

But I think what you point out means that the current recovery code is
incapable of assembling a hugepage if its first page was not already
instantiated earlier: not something I'd realized until you showed me.
Not a common failing, and would never be the case for an extent which had
been mapped huge in the past, but it's certainly not what I'd intended.

As to whether the head should be marked PageTeam immediately after the
hugepage allocation: I think not, especially because of the swapin case
(26/31). Swapin may need to read data from disk into that head page,
and I've never had to think about the consequences of having a swap
page marked PageTeam. Perhaps it would work out okay, but I'd prefer
not to go there.

At this moment I'm too tired to think what the right answer will be,
and certainly won't be able to commit to any without some testing.

So, not as incapacitating as perhaps you thought, and not any danger
to people trying out huge tmpfs, but definitely something to be fixed:
I'll mull it over in the background and let you know when I'm sure.

Thank you again,
Hugh