Re: [RFC PATCH v3 00/26] ext4: use iomap for regular file's buffered IO path and enable large foilo

From: Zhang Yi
Date: Sat Feb 17 2024 - 04:31:53 EST


On 2024/2/12 14:18, Darrick J. Wong wrote:
> On Sat, Jan 27, 2024 at 09:57:59AM +0800, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@xxxxxxxxxx>
>>
>> Hello,
>>
>> This is the third version of RFC patch series that convert ext4 regular
>> file's buffered IO path to iomap and enable large folio. It's rebased on
>> 6.7 and Christoph's "map multiple blocks per ->map_blocks in iomap
>> writeback" series [1]. I've fixed all issues found in the last about 3
>> weeks of stress tests and fault injection tests in v2. I hope I've
>> covered most of the corner cases, and any comments are welcome. :)
>>
>> Changes since v2:
>> - Update patch 1-6 to v3 [2].
>> - iomap_zero and iomap_unshare don't need to update i_size and call
>> iomap_write_failed(), introduce a new helper iomap_write_end_simple()
>> to avoid doing that.
>> - Factor out ext4_[ext|ind]_map_blocks() parts from ext4_map_blocks(),
>> introduce a new helper ext4_iomap_map_one_extent() to allocate
>> delalloc blocks in writeback, which is always under i_data_sem in
>> write mode. This is done to prevent the writing back delalloc
>> extents become stale if it raced by truncate.
>> - Add a lock detection in mapping_clear_large_folios().
>> Changes since v1:
>> - Introduce seq count for iomap buffered write and writeback to protect
>> races from extents changes, e.g. truncate, mwrite.
>> - Always allocate unwritten extents for new blocks, drop dioread_lock
>> mode, and make no distinctions between dioread_lock and
>> dioread_nolock.
>> - Don't add ditry data range to jinode, drop data=ordered mode, and
>> make no distinctions between data=ordered and data=writeback mode.
>> - Postpone updating i_disksize to endio.
>> - Allow splitting extents and use reserved space in endio.
>> - Instead of reimplement a new delayed mapping helper
>> ext4_iomap_da_map_blocks() for buffer write, try to reuse
>> ext4_da_map_blocks().
>> - Add support for disabling large folio on active inodes.
>> - Support online defragmentation, make file fall back to buffer_head
>> and disable large folio in ext4_move_extents().
>> - Move ext4_nonda_switch() in advance to prevent deadlock in mwrite.
>> - Add dirty_len and pos trace info to trace_iomap_writepage_map().
>> - Update patch 1-6 to v2.
>>
>> This series only support ext4 with the default features and mount
>> options, doesn't support inline_data, bigalloc, dax, fs_verity, fs_crypt
>> and data=journal mode, ext4 would fall back to buffer_head path
>
> Do you plan to add bigalloc or !extents support as a part 2 patchset?

Hello,

Sorry for the late reply since I was on the vacation of Chinese New Year.
I've been working on bigalloc support recently and it's going relatively
well, but have no plans to support !extents yet, I would start looking
into it after I finish rebasing my another patch set "ext4: more
accurate metadata reservaion for delalloc mount option" mentioned in my
TODO list.

>
> An ext2 port to iomap has been (vaguely) in the works for a while,
> though iirc willy never got the performance to match because iomap
> didn't have a mechanism for the caller to tell it "run the IO now even
> though you don't have a complete page, because the indirect block is the
> next block after the 11th block".
>

Thanks for pointing this out and the explanation given by Matthew. IIUC,
this problem also affects ext4 in !extents mode, but not affects bigalloc,
right?

Thanks,
Yi.