Re: [PATCH v2 3/5] dax: improve documentation for fsync/msync

From: Jan Kara
Date: Fri Jan 22 2016 - 10:01:26 EST


On Thu 21-01-16 10:46:02, Ross Zwisler wrote:
> Several of the subtleties and assumptions of the DAX fsync/msync
> implementation are not immediately obvious, so document them with comments.
>
> Signed-off-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
> Reported-by: Jan Kara <jack@xxxxxxx>

Thanks, the comments really help! Just two nits below, otherwise feel free
to add:

Reviewed-by: Jan Kara <jack@xxxxxxx>

> ---
> fs/dax.c | 30 ++++++++++++++++++++++++++++++
> 1 file changed, 30 insertions(+)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index d589113..55ae394 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -350,6 +350,13 @@ static int dax_radix_entry(struct address_space *mapping, pgoff_t index,
>
> if (!pmd_entry || type == RADIX_DAX_PMD)
> goto dirty;
> +
> + /*
> + * We only insert dirty PMD entries into the radix tree. This
> + * means we don't need to worry about removing a dirty PTE
> + * entry and inserting a clean PMD entry, thus reducing the
> + * range we would flush with a follow-up fsync/msync call.
> + */

May be acompany this with:

WARN_ON(pmd_entry && !dirty);

somewhere in dax_radix_entry()?

> radix_tree_delete(&mapping->page_tree, index);
> mapping->nrexceptional--;
> }
> @@ -912,6 +919,21 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
> }
> dax_unmap_atomic(bdev, &dax);
>
> + /*
> + * For PTE faults we insert a radix tree entry for reads, and
> + * leave it clean. Then on the first write we dirty the radix
> + * tree entry via the dax_pnf_mkwrite() path. This sequence
^^^ pfn

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR