Re: [PATCH] filesystem-dax: Disable PMD support

From: Matthew Wilcox
Date: Mon Jul 01 2019 - 23:34:26 EST


On Sun, Jun 30, 2019 at 02:37:32PM -0700, Dan Williams wrote:
> On Sun, Jun 30, 2019 at 8:23 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > I think my theory was slightly mistaken, but your fix has the effect of
> > fixing the actual problem too.
> >
> > The xas->xa_index for a PMD is going to be PMD-aligned (ie a multiple of
> > 512), but xas_find_conflict() does _not_ adjust xa_index (... which I
> > really should have mentioned in the documentation). So we go to sleep
> > on the PMD-aligned index instead of the index of the PTE. Your patch
> > fixes this by using the PMD-aligned index for PTEs too.
> >
> > I'm trying to come up with a clean fix for this. Clearly we
> > shouldn't wait for a PTE entry if we're looking for a PMD entry.
> > But what should get_unlocked_entry() return if it detects that case?
> > We could have it return an error code encoded as an internal entry,
> > like grab_mapping_entry() does. Or we could have it return the _locked_
> > PTE entry, and have callers interpret that.
> >
> > At least get_unlocked_entry() is static, but it's got quite a few callers.
> > Trying to discern which ones might ask for a PMD entry is a bit tricky.
> > So this seems like a large patch which might have bugs.
> >
> > Thoughts?
>
> ...but if it was a problem of just mismatched waitqueue's I would have
> expected it to trigger prior to commit b15cd800682f "dax: Convert page
> fault handlers to XArray".

That commit converts grab_mapping_entry() (called by dax_iomap_pmd_fault())
from calling get_unlocked_mapping_entry() to calling get_unlocked_entry().
get_unlocked_mapping_entry() (eventually) called __radix_tree_lookup()
instead of dax_find_conflict().

> This hunk, if I'm reading it correctly,
> looks suspicious: @index in this case is coming directly from
> vm->pgoff without pmd alignment adjustment whereas after the
> conversion it's always pmd aligned from the xas->xa_index. So perhaps
> the issue is that the lock happens at pte granularity. I expect it
> would cause the old put_locked_mapping_entry() to WARN, but maybe that
> avoids the lockup and was missed in the bisect.

I don't think that hunk is the problem. The __radix_tree_lookup()
is going to return a 'slot' which points to the canonical slot, no
matter which of the 512 indices corresponding to that slot is chosen.
So I think it's going to do essentially the same thing.

> @@ -884,21 +711,18 @@ static void *dax_insert_entry(struct
> address_space *mapping,
> * existing entry is a PMD, we will just leave the PMD in the
> * tree and dirty it if necessary.
> */
> - struct radix_tree_node *node;
> - void **slot;
> - void *ret;
> -
> - ret = __radix_tree_lookup(pages, index, &node, &slot);
> - WARN_ON_ONCE(ret != entry);
> - __radix_tree_replace(pages, node, slot,
> - new_entry, NULL);
> + void *old = dax_lock_entry(xas, new_entry);
> + WARN_ON_ONCE(old != xa_mk_value(xa_to_value(entry) |
> + DAX_LOCKED));
> entry = new_entry;
> + } else {
> + xas_load(xas); /* Walk the xa_state */
> }
>
> if (dirty)
> - radix_tree_tag_set(pages, index, PAGECACHE_TAG_DIRTY);
> + xas_set_mark(xas, PAGECACHE_TAG_DIRTY);
>
> - xa_unlock_irq(pages);
> + xas_unlock_irq(xas);
> return entry;
> }