Re: Question about LTS 4.19 patch "89047634f5ce NFS: Don't interrupt file writeout due to fatal errors"

From: ChenXiaoSong
Date: Thu Nov 16 2023 - 22:28:50 EST


On 2023/10/30 22:56, Trond Myklebust wrote:
A refactoring is by definition a change that does not affect code
behaviour. It is obvious that this was never intended to be such a
patch.

The reason that the bug is occurring in 4.19.x, and not in the latest
kernels, is because the former is missing another bugfix (one which
actually is missing a "Fixes:" tag).

Can you therefore please check if applying commit 22876f540bdf ("NFS:
Don't call generic_error_remove_page() while holding locks") fixes the
issue.

Note that the latter patch is needed in any case in order to fix a read
deadlock (as indicated on the label).

Thanks,
Trond

After applying commit 22876f540bdf ("NFS: Don't call generic_error_remove_page() while holding locks"), I encountered an issue of infinite loop:

write ... nfs_updatepage nfs_writepage_setup nfs_setup_write_request nfs_try_to_update_request nfs_wb_page if (clear_page_dirty_for_io(page)) // true nfs_writepage_locked // return 0 nfs_do_writepage // return 0 nfs_page_async_flush // return 0 nfs_error_is_fatal_on_server nfs_write_error_remove_page SetPageError // instead of generic_error_remove_page // loop begin if (clear_page_dirty_for_io(page)) // false if (!PagePrivate(page)) // false ret = nfs_commit_inode = 0 // loop again, never quit

before applying commit 22876f540bdf ("NFS: Don't call generic_error_remove_page() while holding locks"), generic_error_remove_page() will clear PG_private, and infinite loop will never happen:

generic_error_remove_page truncate_inode_page truncate_cleanup_page do_invalidatepage nfs_invalidate_page nfs_wb_page_cancel nfs_inode_remove_request ClearPagePrivate(head->wb_page)

If applying this patch, are other patches required? And I cannot reproducethe read deadlock bug that the patch want to fix, are there specific conditions required to reproduce this read deadlock bug?