Re: [PATCH 1/1] ext4: fallback to complex scan if aligned scan doesn't work

From: Jan Kara
Date: Thu Jan 04 2024 - 10:27:32 EST


On Fri 15-12-23 16:49:50, Ojaswin Mujoo wrote:
> Currently in case the goal length is a multiple of stripe size we use
> ext4_mb_scan_aligned() to find the stripe size aligned physical blocks.
> In case we are not able to find any, we again go back to calling
> ext4_mb_choose_next_group() to search for a different suitable block
> group. However, since the linear search always begins from the start,
> most of the times we end up with the same BG and the cycle continues.
>
> With large fliesystems, the CPU can be stuck in this loop for hours
> which can slow down the whole system. Hence, until we figure out a
> better way to continue the search (rather than starting from beginning)
> in ext4_mb_choose_next_group(), lets just fallback to
> ext4_mb_complex_scan_group() in case aligned scan fails, as it is much
> more likely to find the needed blocks.
>
> Signed-off-by: Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx>

If I understand the difference right, the problem is that while
ext4_mb_choose_next_group() guarantees large enough free space extent for
the CR_GOAL_LEN_FAST or CR_BEST_AVAIL_LEN passes, it does not guaranteed
large enough *aligned* free space extent. Thus for non-aligned allocations
we can fail only due to a race with another allocating process but with
aligned allocations we can consistently fail in ext4_mb_scan_aligned() and
thus livelock in the allocation loop.

If my understanding is correct, feel free to add:

Reviewed-by: Jan Kara <jack@xxxxxxx>

Honza



> ---
> fs/ext4/mballoc.c | 21 +++++++++++++--------
> 1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index d72b5e3c92ec..63f12ec02485 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2895,14 +2895,19 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
> ac->ac_groups_scanned++;
> if (cr == CR_POWER2_ALIGNED)
> ext4_mb_simple_scan_group(ac, &e4b);
> - else if ((cr == CR_GOAL_LEN_FAST ||
> - cr == CR_BEST_AVAIL_LEN) &&
> - sbi->s_stripe &&
> - !(ac->ac_g_ex.fe_len %
> - EXT4_B2C(sbi, sbi->s_stripe)))
> - ext4_mb_scan_aligned(ac, &e4b);
> - else
> - ext4_mb_complex_scan_group(ac, &e4b);
> + else {
> + bool is_stripe_aligned = sbi->s_stripe &&
> + !(ac->ac_g_ex.fe_len %
> + EXT4_B2C(sbi, sbi->s_stripe));
> +
> + if ((cr == CR_GOAL_LEN_FAST ||
> + cr == CR_BEST_AVAIL_LEN) &&
> + is_stripe_aligned)
> + ext4_mb_scan_aligned(ac, &e4b);
> +
> + if (ac->ac_status == AC_STATUS_CONTINUE)
> + ext4_mb_complex_scan_group(ac, &e4b);
> + }
>
> ext4_unlock_group(sb, group);
> ext4_mb_unload_buddy(&e4b);
> --
> 2.39.3
>
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR