Re: [PATCH 1/2] fs: add SEEK_HOLE and SEEK_DATA flags

From: Josef Bacik
Date: Thu Apr 21 2011 - 21:23:08 EST


On Thu, Apr 21, 2011 at 6:28 PM, Eric Blake <eblake@xxxxxxxxxx> wrote:
> Josef Bacik <josef <at> redhat.com> writes:
>
>> This just gets us ready to support the SEEK_HOLE and SEEK_DATA flags. ÂTurns
>> out using fiemap in things like cp cause more problems than it solves, so
>> lets try and give userspace an interface that doesn't suck. ÂSo we have
>>
>> -SEEK_HOLE: this moves the file pos to the nearest hole in the file from the
>> given position. ÂIf the given position is a hole then pos won't move. ÂA "hole"
>> is defined by whatever the fs feels like defining it to be.
>
> You absolutely need to match Solaris semantics, which are documented as follows:
>
> Â Â Â Â ï ÂIf whence is SEEK_HOLE, the offset of the start of the Ânext Âhole
> Â Â Â Â Â Âgreater than or equal to the supplied offset is returned. The def-
> Â Â Â Â Â Âinition of a hole is provided near the end of the DESCRIPTION.
>
> Â Â Â Â ï ÂIf whence is SEEK_DATA, the file pointer is set to Âthe Âstart Âof
> Â Â Â Â Â Âthe Ânext Ânon-hole Âfile region greater than or equal to the sup-
> Â Â Â Â Â Âplied offset.
>
> Â Â Â A Â"hole" is defined as a contiguous range of bytes in a file, all hav-
> Â Â Â ing the value of zero, but not all zeros in a file are guaranteed to be
> Â Â Â represented Âas Âholes returned with SEEK_HOLE. Filesystems are allowed
> Â Â Â to expose ranges of zeros with SEEK_HOLE, but not required to. Applica-
> Â Â Â tions can use SEEK_HOLE to optimise their behavior for ranges of zeros,
> Â Â Â but must not depend on it to find all such ranges in a file. The Âexis-
> Â Â Â tence Âof Âa Âhole at the end of every data region allows for easy pro-
> Â Â Â gramming and implies that a virtual hole exists at the end of the file.
>    Applications  should  use  fpathconf(_PC_MIN_HOLE_SIZE)  or  path-
>    conf(_PC_MIN_HOLE_SIZE) Âto Âdetermine  if  a  filesystem  supports
> Â Â Â SEEK_HOLE. See fpathconf(2).
>
> Â Â Â For Âfilesystems Âthat Âdo not supply information about holes, the file
> Â Â Â will be represented as one entire data region.
>
> Â Â Â ENXIO Â Â Â Â Â For SEEK_DATA, there are no more data regions past Âthe
> Â Â Â Â Â Â Â Â Â Â Â supplied offset. For SEEK_HOLE, there are no more holes
> Â Â Â Â Â Â Â Â Â Â Â past the supplied offset.
>
> Note that in that definition, SEEK_HOLE does _not_ reposition the file offset
> (it returns the offset of the next hole, which might be at the end of the file
> since all files have a virtual hole at the end, but leaves the position
> unchanged). ÂI'd have to write a test program on Solaris to see whether that
> definition is actually true, or if it is a bug in the Solaris man page.
>

lseek's purpose is to reposition the file position, so I'd imagine
this is just a bug in the man page.

> Making SEEK_HOLE blindly fail is therefore not useful; you could just as
> easily make it succeed and return the offset of the last byte of the file for
> all file systems (if the offset requested is within bounds, or fail with
> ENXIO if it is out of bounds), and have a valid, working, and minimal
> implementation already in place.
>

Except you can have blocks allocated past i_size, so returning i_size
isn't necessarily correct. Obviously the idea is to have most of the
filesystems doing the correct thing, but for my first go around I just
was going to do one since I expect quite a bit of repainting for this
bikeshed is yet to be done. But I guess if you have data past i_size
doing this would encourage the various fs people to fix it.

>>
>> -SEEK_DATA: this is obviously a little more self-explanatory. ÂAgain the only
>> ambiguity comes in with preallocated extents. ÂIf you have an fs that can't
>> reliably tell that the preallocated extent is in the process of turning into a
>> real extent, it is correct for SEEK_DATA to park you at a preallocated extent.
>
> In contrast to SEEK_HOLE, SEEK_DATA _does_ reposition the file, or fail with
> ENXIO. ÂBack to your minimal implementation, it would be valid to always fail
> with ENXIO (treating every file as a single block of data followed by a single
> virtual hole at file end, so there is no next data after any current position).
>
>> +++ b/fs/read_write.c
>> @@ -64,6 +64,9 @@ generic_file_llseek_unlocked(struct file *file, loff_t
> offset, int origin)
>> Â Â Â Â Â Â Â Â Â Â Â return file->f_pos;
>> Â Â Â Â Â Â Â offset += file->f_pos;
>> Â Â Â Â Â Â Â break;
>> + Â Â case SEEK_DATA:
>> + Â Â case SEEK_HOLE:
>> + Â Â Â Â Â Â return -EINVAL;
>
> Whatever you do, returning EINVAL is not nice, especially when it seems pretty
> easy to provide a working alternative that assumes all files are a single data
> block until fine-tuned otherwise on a per-filesystem basis.
>

Agreed, I'll fix it. Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/