Re: CVE-2023-52437: Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d" [resend]

From: Paolo Bonzini
Date: Wed Feb 21 2024 - 04:12:35 EST


Resending with LKML in Cc, since posting to the linux-cve-announce
mailing list is restricted to moderators.

On 2/20/24 19:34, Greg Kroah-Hartman wrote:
Description
===========

In the Linux kernel, the following vulnerability has been resolved:

Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"

This reverts commit 5e2cf333b7bd5d3e62595a44d598a254c697cd74.

That commit introduced the following race and can cause system hung.

md_write_start: raid5d:
// mddev->in_sync == 1
set "MD_SB_CHANGE_PENDING"
// running before md_write_start wakeup it
waiting "MD_SB_CHANGE_PENDING" cleared
>>>>>>>>> hung
wakeup mddev->thread
...
waiting "MD_SB_CHANGE_PENDING" cleared
>>>> hung, raid5d should clear this flag
but get hung by same flag.

The issue reverted commit fixing is fixed by last patch in a new way.

Sometimes less than optimal descriptions end up even in Linux commit
messages, and I understand that you're "not going to be adding anything
additional to the report" other than the git commit message. [1] But
this description is not just "suboptimal" English, it also makes zero
sense since it refers to a "last patch" that does not exist.

There are dozens of distros, both commercial and non-commercial, whose
users need a *real* description of what is being fixed. By writing CVE
descriptions that make no sense, you're creating more work for everyone
involved, without putting in place a process to clarify these things
except through "the maintainers of the relevant subsystem
affected"---who are not CC'd to these messages and therefore might not
even know that the CVE announcement exists.

My suggestion is to CC the author of the fix and the maintainer, and if
possible even go through a pre-verification phase similar to what's done
for AUTOSEL patches. If some commit messages are irredeemable, or some
situations are just too complex, and no one is willing to put the work
that's required to do the work properly, the maintainer should have the
possibility to NACK the creation of an unusable CVE entry like this one.

(Somewhat related to this, how are you going to handle patch
dependencies? Sasha's GSD updates has a separate entry for each commit,
the result being "vulnerabilities" with "no functional change" in their
description. Are they instead going to be rolled into a single entry
like this one now that you're actually creating CVEs?)

I am cautiously optimistic that this can be worked out and I agree with
you that lots of bug fixes going into stable have potential security
impact. But as this example shows, there's still more than a few kinks
to be ironed out.

The Linux kernel CVE team has assigned CVE-2023-52437 to this issue.


Affected and fixed versions
===========================

Issue introduced in 5.15.75 with commit 9e86dffd0b02 and fixed in 5.15.148 with commit 84c39986fe6d
Issue introduced in 6.1 with commit 5e2cf333b7bd and fixed in 6.1.74 with commit bed0acf330b2
Issue introduced in 6.1 with commit 5e2cf333b7bd and fixed in 6.1.75 with commit cfa468382858
Issue introduced in 6.1 with commit 5e2cf333b7bd and fixed in 6.6.13 with commit e16a0bbdb7e5
Issue introduced in 6.1 with commit 5e2cf333b7bd and fixed in 6.6.14 with commit aab69ef76970
Issue introduced in 6.1 with commit 5e2cf333b7bd and fixed in 6.7.1 with commit 0de40f76d567
Issue introduced in 6.1 with commit 5e2cf333b7bd and fixed in 6.7.2 with commit 87165c64fe1a

So which one is it of these 6.{1,6,7}.y releases that fixed the issue?

The Linux kernel CVE team recommends that you update to the latest
stable kernel version for this, and many other bugfixes. Individual
changes are never tested alone, but rather are part of a larger kernel
release. Cherry-picking individual commits is not recommended or
supported by the Linux kernel community at all. If however, updating to
the latest release is impossible, the individual changes to resolve this
issue can be found at these commits:
https://git.kernel.org/stable/c/84c39986fe6dd77aa15f08712339f5d4eb7dbe27
https://git.kernel.org/stable/c/bed0acf330b2c50c688f6d9cfbcac2aa57a8e613
https://git.kernel.org/stable/c/cfa46838285814c3a27faacf7357f0a65bb5d152
https://git.kernel.org/stable/c/e16a0bbdb7e590a6607b0d82915add738c03c069
https://git.kernel.org/stable/c/aab69ef769707ad987ff905d79e0bd6591812580
https://git.kernel.org/stable/c/0de40f76d567133b871cd6ad46bb87afbce46983
https://git.kernel.org/stable/c/87165c64fe1a98bbab7280c58df3c83be2c98478
https://git.kernel.org/stable/c/bed9e27baf52a09b7ba2a3714f1e24e17ced386d

Half of these are reverting the revert. I understand that
"cherry-picking individual commits is not recommended" but it looks like
this is a bug in whatever scripts you are using. Are they public, so
that fixes can be developed in the open?

Also, commit 87165c64fe1a9 (the revert of the revert) was marked 5.19+
but 5.15.148 does have the original revert. Does that mean that 5.15.148
still has the "issue with raid5 with journal device" (another hang, see
https://lore.kernel.org/linux-raid/20240123005700.9302-1-dan@xxxxxxxx/)
mentioned in the commit message for 87165c64fe1a9? If so, that
contradicts the fact that updating to the latest release of a given LTS
branch is the best course of action, since for some users 5.15.147 might
be better than 5.15.148.

Paolo

[1] https://lwn.net/ml/linux-kernel/2024021518-stature-frightful-e7fc@gregkh/
[2] https://lwn.net/ml/linux-kernel/2024021430-blanching-spotter-c7c8@gregkh/