Re: [PATCH v2 3/5] dax: improve documentation for fsync/msync

From: Ross Zwisler
Date: Fri Jan 22 2016 - 10:58:38 EST


On Fri, Jan 22, 2016 at 04:01:29PM +0100, Jan Kara wrote:
> On Thu 21-01-16 10:46:02, Ross Zwisler wrote:
> > Several of the subtleties and assumptions of the DAX fsync/msync
> > implementation are not immediately obvious, so document them with comments.
> >
> > Signed-off-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
> > Reported-by: Jan Kara <jack@xxxxxxx>
>
> Thanks, the comments really help! Just two nits below, otherwise feel free
> to add:
>
> Reviewed-by: Jan Kara <jack@xxxxxxx>
>
> > ---
> > fs/dax.c | 30 ++++++++++++++++++++++++++++++
> > 1 file changed, 30 insertions(+)
> >
> > diff --git a/fs/dax.c b/fs/dax.c
> > index d589113..55ae394 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -350,6 +350,13 @@ static int dax_radix_entry(struct address_space *mapping, pgoff_t index,
> >
> > if (!pmd_entry || type == RADIX_DAX_PMD)
> > goto dirty;
> > +
> > + /*
> > + * We only insert dirty PMD entries into the radix tree. This
> > + * means we don't need to worry about removing a dirty PTE
> > + * entry and inserting a clean PMD entry, thus reducing the
> > + * range we would flush with a follow-up fsync/msync call.
> > + */
>
> May be acompany this with:
>
> WARN_ON(pmd_entry && !dirty);
>
> somewhere in dax_radix_entry()?

Sure, I'll add one.

> > radix_tree_delete(&mapping->page_tree, index);
> > mapping->nrexceptional--;
> > }
> > @@ -912,6 +919,21 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
> > }
> > dax_unmap_atomic(bdev, &dax);
> >
> > + /*
> > + * For PTE faults we insert a radix tree entry for reads, and
> > + * leave it clean. Then on the first write we dirty the radix
> > + * tree entry via the dax_pnf_mkwrite() path. This sequence
> ^^^ pfn

Thanks, will fix.