Re: block/badblocks.c warning in 6.7-rc2

From: Coly Li
Date: Wed Nov 29 2023 - 03:08:56 EST




> 2023年11月29日 07:47,Bagas Sanjaya <bagasdotme@xxxxxxxxx> 写道:
>
> Hi,
>
> I notice a regression report that is rather well-handled on Bugzilla [1].
> Quoting from it:
>
>>
>> when booting from 6.7-rc2, compiled with clang, I get this warning on one of my 3 bcachefs volumes:
>> WARNING: CPU: 3 PID: 712 at block/badblocks.c:1284 badblocks_check (block/badblocks.c:1284)
>> The reason why isn't clear, but the stack trace points to an error in md error handling.
>> This bug didn't happen in 6.6
>> there are 3 commits in 6.7-rc2 which may cause them,
>> in attachment:
>> - decoded stacktrace of dmesg
>> - kernel .config
>
> The culprit author then replied:
>
>> The warning is from this line of code in _badblocks_check(),
>> 1284 WARN_ON(bb->shift < 0 || sectors == 0);
>>
>> It means the caller sent an invalid range to check. From the oops information,
>> "RDX: 0000000000000000" means parameter 'sectors' is 0.
>>
>> So the question is, why does md raid code send a 0-length range for badblocks check? Is this behavior on purpose, or improper?
>> ...
>> IMHO, it doesn't make sense for caller to check a zero-length LBA range. The warning works as expect to detect improper call to badblocks_check().
>
> See Bugzilla for the full thread and attached decoded dmesg and kernel config.
>
> Anyway, I'm adding this regression to regzbot:
>
> #regzbot introduced: 3ea3354cb9f03e https://bugzilla.kernel.org/show_bug.cgi?id=218184
> #regzbot title: badblocks_check regression (md error handling) on bcachefs volume
>
> Thanks.
>
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=218184

It seems the improved bad blocks code caught a zero-size bio request from upper layer, this improper behavior was silently neglected before. It might be too early or simple to decide this is a regression, especially Janpieter closes the report for now.

Thanks.

Coly Li