Re: [RFC][PATCH 0/3] add FALLOC_FL_NO_HIDE_STALE flag in fallocate

From: Dave Chinner
Date: Tue Apr 17 2012 - 23:02:11 EST

On Tue, Apr 17, 2012 at 01:53:20PM -0500, Eric Sandeen wrote:
> On 4/17/12 1:43 PM, Ted Ts'o wrote:
> > On Tue, Apr 17, 2012 at 01:59:37PM -0400, Ric Wheeler wrote:
> >>
> >> You could get both security and avoid the run time hit by fully
> >> writing the file or by having a variation that relied on "discard"
> >> (i.e., no need to zero data if we can discard or track it as
> >> unwritten).
> >
> > It's certainly the case that if the device supports persistent
> > discard, something which we definitely *should* do is to send the
> > discard at fallocate time and then mark the space as initialized.
> >
> > Unfortunately, not all devices, and in particular no HDD's for which I
> > aware support persistent discard. And, writing all zero's to the file
> > is in fact what a number of programs for which I am aware (including
> > an enterprise database) are doing, precisely because they tend to
> > write into the fallocated space in a somewhat random order, and the
> > extent conversion costs is in fact quite significant. But writing all
> > zero's to the file before you can use it is quite costly; at the very
> > least it burns disk bandwidth --- one of the main motivations of
> > fallocate was to avoid needing to do a "write all zero pass", and
> > while it does solve the problem for some use cases (such as DVR's),
> > it's not a complete solution.
> Can we please start with profiling the workload causing trouble, see why
> ext4 takes such a hit, and see if anything can be done there to fix
> it surgically, rather than just throwing this big hammer at it?
> In my (admittedly quick, hacky) test, xfs suffed about a 1% perf degradation,
> ext4 about 8%. Until we at least know why ext4 is so much worse, I'll
> signal a strong NAK for this change, for whatever may or may not be worth. :)

In actual fact, on my 12 disk RAID0 array, XFS is faster with
unwritten extents *enabled* than when hacked to turn them off. Yes,
you can turn off unwritten extent tracking in XFS if you know what
you are doing, we just don't provide any interfaces to users to do
so because of all the security problems it entails.

The result (using 256MB prealloc file, 2000 sparse 4k block writes,
one with O_SYNC, the other done async with a post write sync), with
averages over 5 runs are:

O_SYNC post-sync
unwritten 7.297s 5.734s
stale 7.641s 6.108s

These results are consistently repeatable, and only reinforce the
point that if ext4 is slow using unwritten extent tracking, then
it's an implementation problem and not an excuse to add an interface
to expose stale data....


Dave Chinner
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at