Re: [PATCH 2/2] block: create ioctl to discard-or-zeroout a range of blocks

From: Gregory Farnum
Date: Wed Mar 09 2016 - 17:20:44 EST


On Thu, Mar 3, 2016 at 3:10 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Thu, Mar 03, 2016 at 05:39:52PM -0500, Theodore Ts'o wrote:
>> On Thu, Mar 03, 2016 at 01:54:54PM -0500, Martin K. Petersen wrote:
>> > >>>>> "Christoph" == Christoph Hellwig <hch@xxxxxxxxxxxxx> writes:
>> >
>> > Christoph> - FALLOC_FL_PUNCH_HOLE assures zeroes are returned, but
>> > Christoph> space is deallocated as much as possible -
>> > Christoph> FALLOC_FL_ZERO_RANGE assures zeroes are returned, AND blocks
>> > Christoph> are actually allocated
>> >
>> > That works for me. I think it would be great if we could have consistent
>> > interfaces for fs and block. The more commonality the merrier.
>>
>> So a question I have is do we want to add a "discard-as-a-hint" analog
>> for fallocate?
>
> Well defined, reliable behaviour only, please. If the device can't
> provide the required hardware offload, then it needs to use the
> generic, slow implementation of the functionality or report
> EOPNOTSUPP.
>
>> P.S. Speaking of things that are powerful and too dangerous for
>> application programmers, after the Linux FAST workshop, I was having
>> dinner with the Ceph developers and Ric Wheeler, and we were talking
>> about things they really needed. Turns out they also could use an
>> FALLOC_FL_NO_HIDE_STALE functionality.
>
> For better or for worse, Ceph is moving away from using filesystems
> for its back end object store, so the use of such a hack in Ceph
> has a very limited life.

Well, let's be clear: the reason Ceph is moving away from using local
filesystems is because we couldn't get the overheads of using them
down to what we considered an acceptable level. There are always going
to be some inefficiencies from it of course (since you have two
metadata streams) but the more issues get addressed, the fewer
userspace filesystems will feel or run up against the need to do their
own block device management. :) If none of them get fixed the same
scenario will just repeat itself â a userspace filesystem rises, it
tries to get features it needs into the kernel, it eventually gives up
and drops the kernel out of the loop, and then the fact that nobody's
using the kernel in this scenario will be considered a reason not to
make it work better.

I really am sensitive to the security concerns, just know that if it's
a permanent blocker you're essentially blocking out a growing category
of disk users (who run on an awfully large number of disks!).
-Greg

>
>> I told them I had an
>> out-of-tree patch that had that functionality, and even Ric Wheeler
>> started getting tempted.... :-)
>
> You can tempt all you want, but it does not change the basic fact
> that it is dangerous and compromises system security. As such, it
> does not belong in upstream kernels. Especially in this day and age
> where ensuring the fundamental integrity of our systems is more
> important than ever.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html