Re: Bug in commit aa511ff8218b ("badblocks: switch to the improved badblock handling

From: Ira Weiny
Date: Fri Dec 22 2023 - 13:58:18 EST


Ira Weiny wrote:
> Coly,
>
> Yesterday I noticed that a few of our nvdimm tests were failing. I bisected
> the problem to the following commit.
>
> aa511ff8218b ("badblocks: switch to the improved badblock handling code")
>
> Reverting this patch fixed our tests.
>
> I've also dug into the code a bit and I believe the algorithm for
> badblocks_check() is broken (not yet sure about the other calls). At the
> very least I see the bb->p pointer being indexed with '-1'. :-(
>
> I did notice that this work was due to a bug report in badblock_set().
> Therefore, I'm not sure of that severity of that fix is vs a revert. But
> at this point I'm not seeing an easy fix so I'm in favor of a revert.
>

Dan and I were discussing this and it occurs to us that it may be easy for
you to stand up the test environment I'm using.

For CXL we have a run_qemu.sh project[1] which stands up a qemu
environment with the ndctl[2] tests in them. Clone ndctl to ~/git/ndctl
so run_qemu.sh can find it. Then start run_qemu.sh in a kernel tree like
this:

$ <path_to_run_qemu>/run_qemu.sh --cxl --nfit-test --nfit-debug [-r img]

[-r img] is optional but useful if you have changed the ndctl tests.

Once booted you can run the test suite with meson:

$ cd ndctl && meson test -C build

I've been running just our clear.sh test which shows the error.

$ cd ndctl/build && meson test clear.sh

Hope this helps,
Ira

[1] https://github.com/pmem/run_qemu
[2] https://github.com/pmem/ndctl