On Sat 17-02-24 15:41:43, Baokun Li wrote:Well, I will put the next patch before this one in the next version.
On 2024/2/14 0:58, Jan Kara wrote:Yes, I think reordering would be good. Because I've read the convertion and
On Fri 26-01-24 16:57:13, Baokun Li wrote:Yes, we could put the next patch before this one, but using
We can easily trigger a BUG_ON by using the following commands:I don't think you need to change s_mb_group_prealloc here and then restrict
mount /dev/$disk /tmp/test
echo 2147483650 > /sys/fs/ext4/$disk/mb_group_prealloc
echo test > /tmp/test/file && sync
==================================================================
kernel BUG at fs/ext4/mballoc.c:2029!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 3 PID: 320 Comm: kworker/u36:1 Not tainted 6.8.0-rc1 #462
RIP: 0010:mb_mark_used+0x358/0x370
[...]
Call Trace:
ext4_mb_use_best_found+0x56/0x140
ext4_mb_complex_scan_group+0x196/0x2f0
ext4_mb_regular_allocator+0xa92/0xf00
ext4_mb_new_blocks+0x302/0xbc0
ext4_ext_map_blocks+0x95a/0xef0
ext4_map_blocks+0x2b1/0x680
ext4_do_writepages+0x733/0xbd0
[...]
==================================================================
In ext4_mb_normalize_group_request():
ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc;
Here fe_len is of type int, but s_mb_group_prealloc is of type unsigned
int, so setting s_mb_group_prealloc to 2147483650 overflows fe_len to a
negative number, which ultimately triggers a BUG_ON() in mb_mark_used().
Therefore, we add attr_pointer_pi (aka positive int attr pointer) with a
value range of 0-INT_MAX to avoid the above problem. In addition to the
mb_group_prealloc sysfs interface, the following interfaces also have uint
to int conversions that result in overflows, and are also fixed.
err_ratelimit_burst
msg_ratelimit_burst
warning_ratelimit_burst
err_ratelimit_interval_ms
msg_ratelimit_interval_ms
warning_ratelimit_interval_ms
mb_best_avail_max_trim_order
CC: stable@xxxxxxxxxxxxxxx
Signed-off-by: Baokun Li <libaokun1@xxxxxxxxxx>
it even further in the next patch. I'd just leave it alone here.
s_mb_group_prealloc as an example makes it easier to understand
why the attr_pointer_pi case is added here.There are several other
variables that don't have more convincing examples.
started wondering: "is this enough?"
Yes, this means that in CR1.5, in case the original request is satisfied,Well, s_mb_best_avail_max_trim_order is not about allocation passes butAlso I think that limiting mb_best_avail_max_trim_order to 64 instead ofI think it's enough here to make sure that mb_best_avail_max_trim_order
INT_MAX will make us more resilient to surprises in the future :) But I
don't really insist.
Honza
is a positive number, since we always make sure that min_order
is not less than 0, as follows:
order = fls(ac->ac_g_ex.fe_len) - 1;
min_order = order - sbi->s_mb_best_avail_max_trim_order;
if (min_order < 0)
min_order = 0;
An oversized mb_best_avail_max_trim_order can be interpreted as
always being CR_ANY_FREE. 😄
about how many times are we willing to shorten the goal extent to half and
still use the advanced free blocks search.
And I agree that the mballocYes, we shouldn't allow storing rubbish values, otherwise it may
code is careful enough that large numbers don't matter there but still why
allowing storing garbage values? It is nicer to tell sysadmin he did
something wrong right away.
Honza