Re: [PATCH] ext4: get discard out of jbd2 commit kthread

From: Theodore Y. Ts'o
Date: Tue May 18 2021 - 10:58:02 EST


On Tue, May 18, 2021 at 09:19:13AM +0800, Wang Jianchao wrote:
> > That way we don't need to move all of this to a kworker context.
>
> The submit_bio also needs to be out of jbd2 commit kthread as it may be
> blocked due to blk-wbt or no enough request tag. ;)

Actually, there's a bigger deal that I hadn't realized, about why we
is why are currently using submit_bio_wait(). We *must* wait until
discard has completed before we call ext4_free_data_in_buddy(), which
is what allows those blocks to be reused by the block allocator.

If the discard happens after we reallocate the block, there is a good
chance that we will end up corrupting a data or metadata block,
leading to user data loss.

There's another corollary to this; if you use blk-wbt, and you are
doing lots of deletes, and we move this all to a writeback thread,
this *significantly* increases the chance that the user will see
ENOSPC errors in the case where they are with a very full (close to
100% used) file system.

I'd argue that this is a *really* good reason why using mount -o
discard is Just A Bad Idea if you are running with blk-wbt. If
discards are slow, using fstrim is a much better choice. It's also
the case that for most SSD's and workloads, doing frequent discards
doesn't actually help that much. The write endurance of the device is
not compromised that much if you only run fs-trim and discard unused
blocks once a day, or even once a week --- I only recommend use of
mount -o discard in cases where the discard operation is effectively
free. (e.g., in cases where the FTL is implemented on the Host OS, or
you are running with super-fast flash which is PCIe or NVMe attached.)

Cheers,

- Ted