Re: [PATCH 3/4] ext4: avoid online resizing failures due to oversized flex bg

From: Jan Kara
Date: Thu Oct 19 2023 - 07:44:46 EST


On Wed 18-10-23 19:42:20, Baokun Li wrote:
> When we online resize an ext4 filesystem with a oversized flexbg_size,
>
> mkfs.ext4 -F -G 67108864 $dev -b 4096 100M
> mount $dev $dir
> resize2fs $dev 16G
>
> the following WARN_ON is triggered:
> ==================================================================
> WARNING: CPU: 0 PID: 427 at mm/page_alloc.c:4402 __alloc_pages+0x411/0x550
> Modules linked in: sg(E)
> CPU: 0 PID: 427 Comm: resize2fs Tainted: G E 6.6.0-rc5+ #314
> RIP: 0010:__alloc_pages+0x411/0x550
> Call Trace:
> <TASK>
> __kmalloc_large_node+0xa2/0x200
> __kmalloc+0x16e/0x290
> ext4_resize_fs+0x481/0xd80
> __ext4_ioctl+0x1616/0x1d90
> ext4_ioctl+0x12/0x20
> __x64_sys_ioctl+0xf0/0x150
> do_syscall_64+0x3b/0x90
> ==================================================================
>
> This is because flexbg_size is too large and the size of the new_group_data
> array to be allocated exceeds MAX_ORDER. Currently, the minimum value of
> MAX_ORDER is 8, the minimum value of PAGE_SIZE is 4096, the corresponding
> maximum number of groups that can be allocated is:
>
> (PAGE_SIZE << MAX_ORDER) / sizeof(struct ext4_new_group_data) ≈ 21845
>
> And the value that is down-aligned to the power of 2 is 16384. Therefore,
> this value is defined as MAX_RESIZE_BG, and the number of groups added
> each time does not exceed this value during resizing, and is added multiple
> times to complete the online resizing. The difference is that the metadata
> in a flex_bg may be more dispersed.
>
> Signed-off-by: Baokun Li <libaokun1@xxxxxxxxxx>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@xxxxxxx>

Honza

> ---
> fs/ext4/resize.c | 25 +++++++++++++++++--------
> 1 file changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
> index 0a57b199883c..e168a9f59600 100644
> --- a/fs/ext4/resize.c
> +++ b/fs/ext4/resize.c
> @@ -218,10 +218,17 @@ struct ext4_new_flex_group_data {
> in the flex group */
> __u16 *bg_flags; /* block group flags of groups
> in @groups */
> + ext4_group_t resize_bg; /* number of allocated
> + new_group_data */
> ext4_group_t count; /* number of groups in @groups
> */
> };
>
> +/*
> + * Avoiding memory allocation failures due to too many groups added each time.
> + */
> +#define MAX_RESIZE_BG 16384
> +
> /*
> * alloc_flex_gd() allocates a ext4_new_flex_group_data with size of
> * @flexbg_size.
> @@ -236,14 +243,18 @@ static struct ext4_new_flex_group_data *alloc_flex_gd(unsigned int flexbg_size)
> if (flex_gd == NULL)
> goto out3;
>
> - flex_gd->count = flexbg_size;
> - flex_gd->groups = kmalloc_array(flexbg_size,
> + if (unlikely(flexbg_size > MAX_RESIZE_BG))
> + flex_gd->resize_bg = MAX_RESIZE_BG;
> + else
> + flex_gd->resize_bg = flexbg_size;
> +
> + flex_gd->groups = kmalloc_array(flex_gd->resize_bg,
> sizeof(struct ext4_new_group_data),
> GFP_NOFS);
> if (flex_gd->groups == NULL)
> goto out2;
>
> - flex_gd->bg_flags = kmalloc_array(flexbg_size, sizeof(__u16),
> + flex_gd->bg_flags = kmalloc_array(flex_gd->resize_bg, sizeof(__u16),
> GFP_NOFS);
> if (flex_gd->bg_flags == NULL)
> goto out1;
> @@ -1602,8 +1613,7 @@ static int ext4_flex_group_add(struct super_block *sb,
>
> static int ext4_setup_next_flex_gd(struct super_block *sb,
> struct ext4_new_flex_group_data *flex_gd,
> - ext4_fsblk_t n_blocks_count,
> - unsigned int flexbg_size)
> + ext4_fsblk_t n_blocks_count)
> {
> struct ext4_sb_info *sbi = EXT4_SB(sb);
> struct ext4_super_block *es = sbi->s_es;
> @@ -1627,7 +1637,7 @@ static int ext4_setup_next_flex_gd(struct super_block *sb,
> BUG_ON(last);
> ext4_get_group_no_and_offset(sb, n_blocks_count - 1, &n_group, &last);
>
> - last_group = group | (flexbg_size - 1);
> + last_group = group | (flex_gd->resize_bg - 1);
> if (last_group > n_group)
> last_group = n_group;
>
> @@ -2130,8 +2140,7 @@ int ext4_resize_fs(struct super_block *sb, ext4_fsblk_t n_blocks_count)
> /* Add flex groups. Note that a regular group is a
> * flex group with 1 group.
> */
> - while (ext4_setup_next_flex_gd(sb, flex_gd, n_blocks_count,
> - flexbg_size)) {
> + while (ext4_setup_next_flex_gd(sb, flex_gd, n_blocks_count)) {
> if (time_is_before_jiffies(last_update_time + HZ * 10)) {
> if (last_update_time)
> ext4_msg(sb, KERN_INFO,
> --
> 2.31.1
>
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR