Re: [PATCH v7 07/22] Replace the XIP page fault handler with the DAX page fault handler

From: Jan Kara
Date: Mon Apr 14 2014 - 12:05:21 EST


On Sun 13-04-14 07:21:32, Matthew Wilcox wrote:
> On Wed, Apr 09, 2014 at 11:12:03PM +0200, Jan Kara wrote:
> > This would be fine except that unmap_mapping_range() grabs i_mmap_mutex
> > again :-|. But it might be easier to provide a version of that function
> > which assumes i_mmap_mutex is already locked than what I was suggesting.
>
> *sigh*. I knew that once ... which was why the call was after dropping
> the lock. OK, another try at fixing the problem; handle it down in the
> insert_pfn code:
OK, that change looks OK to me (although you might want to introduce
vm_replace_mixed() in a separate patch).

> > > > > +int dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
> > > > > + get_block_t get_block)
> > > > > +{
> > > > > + int result;
> > > > > + struct super_block *sb = file_inode(vma->vm_file)->i_sb;
> > > > > +
> > > > > + sb_start_pagefault(sb);
> > > > You don't need any filesystem freeze protection for the fault handler
> > > > since that's not going to modify the filesystem.
> > >
> > > Err ... we might allocate a block as a result of doing a write to a hole.
> > > Or does that not count as 'modifying the filesystem' in this context?
> > Ah, it does. But it would be nice to avoid doing sb_start_pagefault() if
> > it's not a write fault - because you don't want to block reading from a
> > frozen filesystem (imagine what would happen when you freeze your root
> > filesystem to do a snapshot...).
> >
> > I have somewhat a mindset of standard pagecache mmap where filemap_fault()
> > only reads in data regardless of FAULT_FLAG_WRITE setting so I was confused
> > by your difference :).
>
> Understood! So this should work:
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 2453025..e4d00fc 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -431,10 +431,13 @@ int dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
> int result;
> struct super_block *sb = file_inode(vma->vm_file)->i_sb;
>
> - sb_start_pagefault(sb);
> - file_update_time(vma->vm_file);
> + if (vmf->flags & FAULT_FLAG_WRITE) {
> + sb_start_pagefault(sb);
> + file_update_time(vma->vm_file);
> + }
Yup, this looks good to me. Later if we find file_update_time() is
slowing down faults too much, we can defer the actual update to msync()
/ close() time (POSIX actually allows that). But that's definitely for
future.

> result = do_dax_fault(vma, vmf, get_block);
> - sb_end_pagefault(sb);
> + if (vmf->flags & FAULT_FLAG_WRITE)
> + sb_end_pagefault(sb);
>
> return result;
> }

Honza
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/