Re: [PATCH V5 1/2] FS: Add generic data flush to fsync

From: Debabrata Banerjee
Date: Wed May 14 2014 - 13:16:04 EST


On Mon, May 12, 2014 at 5:50 PM, Andrew Morton
<akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Mon, 12 May 2014 07:20:27 +0200 Fabian Frederick <fabf@xxxxxxxxx> wrote:
>
>> This patch issues a flush in generic_file_fsync.
>> (Modern filesystems already do it)
>>
>> Behaviour can be reversed using /sys/devices/.../cache_type
>> or by calling __generic_file_fsync
>
> Well OK, but why? What effect does the patch have? Does it make the
> kernel better and if so, how?

Doing flush of the device cache is the right to do. Without this, any
fsync or sync call makes absolutely no guarantees that your data is on
non-volatile storage, and both calls or commands may as well be a NOP,
especially given the sizes of write-back caches on modern drives.

This actually fixes a long standing problem with ext4, that
sync()/fsync() actually don't function if you don't have journal
enabled. However a patch to ext4 may be considered such that nobarrier
mount flag is honored without a journal via calling
__generic_file_fsync instead.

i.e.

diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index a8bc47f..779018a 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -107,7 +107,11 @@ int ext4_sync_file(struct file *file, loff_t
start, loff_t end, int datasync)
}

if (!journal) {
- ret = generic_file_fsync(file, start, end, datasync);
+ if (test_opt(inode->i_sb, BARRIER))
+ ret = generic_file_fsync(file, start, end, datasync);
+ else
+ ret = __generic_file_fsync(file, start, end, datasync);
+
if (!ret && !hlist_empty(&inode->i_dentry))
ret = ext4_sync_parent(inode);
goto out;
diff --git a/Documentation/filesystems/ext4.txt
b/Documentation/filesystems/ext4.txt
index 919a329..3781568 100644
--- a/Documentation/filesystems/ext4.txt
+++ b/Documentation/filesystems/ext4.txt
@@ -182,8 +182,8 @@ commit=nrsec (*) Ext4 can be told to
sync all its data and metadata
Setting it to very large values will improve
performance.

-barrier=<0|1(*)> This enables/disables the use of write barriers in
-barrier(*) the jbd code. barrier=0 disables, barrier=1 enables.
+barrier=<0|1(*)> This enables/disables the use of write barriers.
+barrier(*) barrier=0 disables, barrier=1 enables.
nobarrier This also requires an IO stack which can support
barriers, and if jbd gets an error on a barrier
write, it will disable again with a warning.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/