Re: [PATCH] f2fs: disble physical prealloc in LSF mount

From: Javier GonzÃlez
Date: Tue Nov 26 2019 - 01:43:42 EST


On 26.11.2019 06:20, Damien Le Moal wrote:
+ Shin'Ichiro

On 2019/11/26 15:19, Damien Le Moal wrote:
On 2019/11/26 12:58, Javier GonzÃlez wrote:
On 26.11.2019 02:06, Damien Le Moal wrote:
On 2019/11/26 4:03, Javier GonzÃlez wrote:
On 25.11.2019 00:48, Damien Le Moal wrote:
On 2019/11/22 18:00, Javier GonzÃlez wrote:
From: Javier GonzÃlez <javier.gonz@xxxxxxxxxxx>

Fix file system corruption when using LFS mount (e.g., in zoned
devices). Seems like the fallback into buffered I/O creates an
inconsistency if the application is assuming both read and write DIO. I
can easily reproduce a corruption with a simple RocksDB test.

Might be that the f2fs_forced_buffered_io path brings some problems too,
but I have not seen other failures besides this one.

Problem reproducible without a zoned block device, simply by forcing
LFS mount:

$ sudo mkfs.f2fs -f -m /dev/nvme0n1
$ sudo mount /dev/nvme0n1 /mnt/f2fs
$ sudo /opt/rocksdb/db_bench --benchmarks=fillseq --use_existing_db=0
--use_direct_reads=true --use_direct_io_for_flush_and_compaction=true
--db=/mnt/f2fs --num=5000 --value_size=1048576 --verify_checksum=1
--block_size=65536

Note that the options that cause the problem are:
--use_direct_reads=true --use_direct_io_for_flush_and_compaction=true

Fixes: f9d6d0597698 ("f2fs: fix out-place-update DIO write")

Signed-off-by: Javier GonzÃlez <javier.gonz@xxxxxxxxxxx>
---
fs/f2fs/data.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 5755e897a5f0..b045dd6ab632 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1081,9 +1081,6 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
return err;
}

- if (direct_io && allow_outplace_dio(inode, iocb, from))
- return 0;

Since for LFS mode, all DIOs can end up out of place, I think that it
may be better to change allow_outplace_dio() to always return true in
the case of LFS mode. So may be something like:

static inline int allow_outplace_dio(struct inode *inode,
struct kiocb *iocb, struct iov_iter *iter)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
int rw = iov_iter_rw(iter);

return test_opt(sbi, LFS) ||
(rw == WRITE && !block_unaligned_IO(inode, iocb, iter));
}

instead of the original:

static inline int allow_outplace_dio(struct inode *inode,
struct kiocb *iocb, struct iov_iter *iter)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
int rw = iov_iter_rw(iter);

return (test_opt(sbi, LFS) && (rw == WRITE) &&
!block_unaligned_IO(inode, iocb, iter));
}

Thoughts ?


I see what you mean and it makes sense. However, the problem I am seeing
occurs when allow_outplace_dio() returns true, as this is what creates
the inconsistency between the write being buffered and the read being
DIO.

But if the write is switched to buffered, the DIO read should use the
buffered path too, no ? Since this is all happening under VFS, the
generic DIO read path will not ensure that the buffered writes are
flushed to disk before issuing the direct read, I think. So that would
explain your data corruption, i.e. you are reading stale data on the
device before the buffered writes make it to the media.


As far as I can see, the read is always sent DIO, so yes, I also believe
that we are reading stale data. This is why the corruption is not seen
if preventing allow_outplace_dio() from sending the write to the
buffered path.

What surprises me is that this is very easy to trigger (see commit), so
I assume you must have seen this with SMR in the past.

We just did. Shin'Ichiro in my team finally succeeded in recreating the
problem. The cause seems to be:

bool direct_io = iocb->ki_flags & IOCB_DIRECT;

being true on entry of f2fs_preallocate_blocks() whereas
f2fs_direct_IO() forces buffered IO path for DIO on zoned devices with:

if (f2fs_force_buffered_io(inode, iocb, iter))
return 0;

which has:

if (f2fs_sb_has_blkzoned(sbi))
return true;

So the top DIO code says "do buffered IOs", but lower in the write path,
the IO is still assumed to be a DIO because of the iocb flag... That's
inconsistent.

Note that for the non-zoned device LFS case, f2fs_force_buffered_io()
returns true only for unaligned write DIOs... But that will still trip
on the iocb flag test. So the proper fix is likely something like:

int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
{
struct inode *inode = file_inode(iocb->ki_filp);
struct f2fs_map_blocks map;
int flag;
int err = 0;
- bool direct_io = iocb->ki_flags & IOCB_DIRECT;
+ bool direct_io = (iocb->ki_flags & IOCB_DIRECT) &&
+ !2fs_force_buffered_io(inode, iocb, iter);

/* convert inline data for Direct I/O*/
if (direct_io) {
err = f2fs_convert_inline_inode(inode);
if (err)
return err;
}

Shin'Ichiro tried this on SMR disks and the failure is gone...

Cheers.


Yes! This is it. I originally though that the problem was on
f2fs_force_buffered_io(), but could not hit the problem there. Thanks
for the analysis; it makes sense now.

Just tested your patch on our drives and the problem is gone too. Guess
you can send a new patch an ignore this one. You can set my reviewed-by
on it.

Thanks Damien!
Javier