Re: [PATCH 3/6] f2fs: do not expose unwritten blocks to user by DIO

From: Eric Biggers
Date: Fri Jan 07 2022 - 18:13:14 EST


Hi Jaegeuk,

On Tue, Jan 04, 2022 at 01:24:16PM -0800, Jaegeuk Kim wrote:
> DIO preallocates physical blocks before writing data, but if an error occurrs
> or power-cut happens, we can see block contents from the disk. This patch tries
> to fix it by 1) turning to buffered writes for DIO into holes, 2) truncating
> unwritten blocks from error or power-cut.
>
> Signed-off-by: Jaegeuk Kim <jaegeuk@xxxxxxxxxx>
> ---
> fs/f2fs/data.c | 5 ++++-
> fs/f2fs/f2fs.h | 5 +++++
> fs/f2fs/file.c | 27 ++++++++++++++++++---------
> fs/f2fs/inode.c | 8 ++++++++
> 4 files changed, 35 insertions(+), 10 deletions(-)

Unfortunately, this patch doesn't completely fix the uninitialized data
exposure. The problem is that it only makes DIO writes fall back to buffered
writes for holes, and not for reserved blocks (NEW_ADDR). f2fs's reserved
blocks are *not* the same as the unwritten extents that other filesystems have;
f2fs's reserved blocks have to be turned into regular blocks before DIO can
write to them. That immediately exposes them to concurrent reads (at least
buffered reads, but I think DIO reads too).

This can be reproduced using the aiodio_sparse program from LTP, as follows:

aiodio_sparse -i 4 -a 8k -w 1024k -s 4096k -n 6

This was reported at
https://lore.kernel.org/r/20211226132851.GC34518@xsang-OptiPlex-9020 as a
regression from the commit "f2fs: use iomap for direct I/O", but it actually
failed before too. Note that it's nondeterministic; writing random data to the
block device before creating the filesystem helps with reproduction.

I see only three possible solutions:

* Make DIO writes to reserved blocks fall back to buffered writes, just like
writes to holes. This would mean that a file would have to be written to
before direct writes would work; fallocate() wouldn't be enough. Note that my
patch https://lore.kernel.org/r/20210728015154.171507-1-ebiggers@xxxxxxxxxx
would have done this.

* Or, change the f2fs on-disk format to support unwritten extents.

* Or, split up block allocation into two parts, so that blocks could be
preliminarily allocated and not exposed to reads yet. This would be like
Ted's suggestion here: https://lore.kernel.org/r/YQS5eBljtztWwOFE@xxxxxxx

- Eric