Re: tmpfs returns incorrect data on concurrent pread() and truncate()

From: Theodore Ts'o
Date: Wed Nov 02 2016 - 09:21:21 EST


On Tue, Nov 01, 2016 at 06:38:26PM -0700, Hugh Dickins wrote:
> > Put simple: page locks are insufficient as a generic mechanism for
> > serialising filesystem operations. The locking required for this is
> > generally deeply filesystem implementation specific, so it's fine
> > that the VFS doesn't attempt to provide anything stricter than it
> > currently does....
>
> I think you are saying that: xfs already provides the extra locking
> that avoids this issue; most other filesystems do not; but more can
> be expected to add that extra locking in the coming months?

The test program was using buffered reads, and ext4 is using
generic_file_read_iter(), so presumably adding file system specific
locking would require that file systems which use
generic_file_read_iter() would need to front-end it with the
additional required file system specific locks. This is what XFS is
currently doing, but a quick git grep seems to show that besides XFS,
nfs, cifs, ceph and ocfs2 (i.e., XFS and remote file systems),
everyone else is using generic_file_read_iter() is using it directly
as read_iter method.

So if we think we really care about getting this case right, and if we
think the locking overhead in the uncontended case is sufficiently low
that it's worth fixing (it'll probably mean the bonnie benchmark will
take a hit, but that's a pretty crappy benchmark anyway :-), perhaps
we should be thinking about some way of adding a generic locking
mechanism so that most file systems will have a fighting chance of
getting this right.

Otherwise, perhaps the major file systems will add the appropriate
locking to the buffered I/O case (much more attention has been paid to
the direct I/O case, which is where more workloads have historically
cared) but I suspect as a percentage of the file systems in the Linux
kernel, they won't be getting this case right.

Cheers,

- Ted