Re: [PATCH v3 0/5] Fix a whole host of nvmem registration/cleanup issues

From: Hector Martin
Date: Tue Jan 03 2023 - 13:12:58 EST


On 04/01/2023 01.58, Russell King (Oracle) wrote:
> Hi,
>
> This series fixes a whole host of nvmem registration/error cleanup
> issues that have been identified by both Hector and myself. It is a
> substantial rework of my original patch fixing the first problem.
>
> The first most obvious problem is the race between nvmem registration
> and use, which leads to sporadic failures of drivers to probe at boot
> time.
>
> While fixing this, it has been noticed that a recent fix to check the
> return value of dev_set_name() introduced a new bug where wp_gpio was
> not being put in that newly introduced error path.
>
> Then there's a fix for a previous fix which itself purports to fix
> another bug, but results in the allocated ID being leaked. Fix for a
> fix for a fix is not good!
>
> Then there's an error in the docbook documentation for wp_gpio (it's
> listed as wp-gpio instead) but as nothing seems to set wp_gpio, we
> might as well get rid of it - which also solves the issue that we
> call gpiod_put() on this whether we own it or not.
>
> Lastly, there's a fix for yet another spurious white-space in this
> code, one of what seems to be a long history of past white-space
> fixes.
>
> These patches have been individually build-tested in the order of
> posting, but not run-time tested except for the entire series.
>
> drivers/nvmem/core.c | 51 ++++++++++++++++++------------------------
> include/linux/nvmem-provider.h | 2 --
> 2 files changed, 22 insertions(+), 31 deletions(-)
>

Uhh. The series itself looks fine as far as fixing the problems, but I
fail to see how this is any better than my attempt as far as backporting
or commit atomicity goes. Patch #4 fixes the newer gpio leak bug *and*
half fixes the race condition bug, then patch #5 completes the race
condition fix but now depends on #4, meaning you're left with exactly
the same backporting mess since now you can't apply #5 to older kernels
and #4 only to newer ones. Splitting the commits like this buys you nothing.

I thought we were doing minimal backportable fixes to solve this, but
your commit message for #4 literally says "While a minimal fix for this
would be to add the gpiod_put() call, we can do better if we split
device_register() [...]"... and then that whole "let's do better" part
is what breaks the backportability again.

And then of course if you *do* manage to queue at least #4 to be
backported to a newer subset of stable trees, #3 certainly isn't going
to get backported itself (since it's just removing dead code, not
eligible for stable since it fixes no actual bugs), but then you're left
with the same
broken-on-paper-except-nobody-uses-it-anyway-so-it-doesn't-matter
situation my v2 left us in for those stable kernels.

That said, thanks for identifying that nobody uses the functionality I
supposedly regressed (in a tiny corner case code path where it was
already broken anyway) in my v2, and therefore I didn't actually regress
anything in practice and strictly fixed real bugs.

- Hector