Re: Problems with determining data presence by examining extents?

From: Christoph Hellwig
Date: Wed Jan 15 2020 - 03:38:59 EST


On Tue, Jan 14, 2020 at 04:48:29PM +0000, David Howells wrote:
> Again with regard to my rewrite of fscache and cachefiles:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter
>
> I've got rid of my use of bmap()! Hooray!
>
> However, I'm informed that I can't trust the extent map of a backing file to
> tell me accurately whether content exists in a file because:
>
> (a) Not-quite-contiguous extents may be joined by insertion of blocks of
> zeros by the filesystem optimising itself. This would give me a false
> positive when trying to detect the presence of data.
>
> (b) Blocks of zeros that I write into the file may get punched out by
> filesystem optimisation since a read back would be expected to read zeros
> there anyway, provided it's below the EOF. This would give me a false
> negative.

The whole idea of an out of band interface is going to be racy and suffer
from implementation loss. I think what you want is something similar to
the NFSv4.2 READ_PLUS operation - give me that if there is any and
otherwise tell me that there is a hole. I think this could be a new
RWF_NOHOLE or similar flag, just how to return the hole size would be
a little awkward. Maybe return a specific negative error code (ENODATA?)
and advance the iov anyway.