Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again

From: Miklos Szeredi
Date: Thu Dec 21 2006 - 12:51:51 EST


> On Thu, Dec 21, 2006 at 04:53:56PM +0100, Miklos Szeredi wrote:
> > I'll first answer the last paragraph.
> >
> > > I suggest that in order for fuse to work reliably on ARM, it is modified
> > > to behave more like a reasonable device driver, and use the functions
> > > defined in asm/uaccess.h when it wants to access the current processes
> > > VM space.
> >
> > Fuse needs to use get_user_pages() to work around certain deadlock
> > scenarios (see Documentation/filesystems/fuse.txt), that are
> > problematic with copy_*_user().
>
> Hmm, okay (though the documentation doesn't really provide enough
> explaination for me to get a grasp of exactly what's going on.)

The root of the problem is that copy_to_user() may cause page faults
on the userspace buffer, and the page fault might (in case of a
maliciously crafted filesystem) recurse into the filesystem itself.

This applies to get_user_pages() as well, but in that case the
function taking the fault is different from the one doing the copy,
hence the original request can be aborted while in get_user_pages().

It cannot be aborted during the copy (since data belonging to the
original request is still being used), but unlike copy_to_user(),
memcpy() will never block.

> > > However, and this is the problem, we need cache maintainence _after_
> > > that memcpy() has completed - with a write allocate VIVT cache, the
> > > memcpy() itself will allocate cache lines in the kernel mapping of
> > > the page which will need to be written back for the user process to
> > > see that data.
> >
> > Yes, note the flush_dcache_page() call in fuse_copy_finish(). That
> > could be replaced by the flush_kernel_dcache_page() (added by James
> > Bottomley together with flush_anon_page()) when all relevant
> > architectures have defined it.
>
> I'm not entirely convinced that it can be replaced. What if the page
> is in the page cache and is shared with other processes? That quite
> clearly falls under flush_dcache_page()'s remit.

But flush_dcache_page() has already been called in get_user_pages(),
so the user mappings are flushed before copying the data to the kernel
mapping, and only the kernel mapping need to be flushed _after_ the
copying.

> > > Now, throw in SMP or preempt with a multi-threaded userspace program
> > > touching the page in question, and the problem just gets much much
> > > worse. In such a scenario, we can not guarantee, no matter how much
> > > cache maintainence is applied to the kernel, that this API comes
> > > anywhere near to being safe.
> >
> > This is only problematic if multiple threads are touching the same
> > page, no? If the page(s) used for reading/writing requests are
> > exclusive to each thread, then there should be no problem. This is a
> > reasonable requirement towards the userspace filesystem I think.
>
> Such a restriction needs to be clearly documented against get_user_pages()
> so that users don't expect something from it which it can't deliver.

Agreed.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/