Re: [PATCH v2] nvmem: core: Fix race in nvmem_register()

From: Hector Martin
Date: Tue Jan 03 2023 - 08:49:34 EST


On 03/01/2023 21.41, Srinivas Kandagatla wrote:
>
>
> On 03/01/2023 11:44, Hector Martin wrote:
>> nvmem_register() currently registers the device before adding the nvmem
>> cells, which creates a race window where consumers may find the nvmem
>> device (and not get PROBE_DEFERred), but then fail to find the cells and
>> error out.
>>
>> Move device registration to the end of nvmem_register(), to close the
>> race.
>>
>> Observed when the stars line up on Apple Silicon machines with the (not
>> yet upstream, but trivial) spmi nvmem driver and the macsmc-rtc client:
>>
>> [ 0.487375] macsmc-rtc macsmc-rtc: error -ENOENT: Failed to get rtc_offset NVMEM cell
>>
>> Fixes: eace75cfdcf7 ("nvmem: Add a simple NVMEM framework for nvmem providers")
>> Cc: stable@xxxxxxxxxxxxxxx
>> Reviewed-by: Eric Curtin <ecurtin@xxxxxxxxxx>
>> Signed-off-by: Hector Martin <marcan@xxxxxxxxx>
>> ---
>
> What has changed since v1?

What you told me to. I'm trying to get a silly bug fixed after you
ignored my original patch until Russell independently discovered and
submitted a fix for the same thing. I think we've wasted enough
developer time here already.

>
>> drivers/nvmem/core.c | 32 +++++++++++++++++---------------
>> 1 file changed, 17 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/nvmem/core.c b/drivers/nvmem/core.c
>> index 321d7d63e068..606f428d6292 100644
>> --- a/drivers/nvmem/core.c
>> +++ b/drivers/nvmem/core.c
>> @@ -822,11 +822,8 @@ struct nvmem_device *nvmem_register(const struct nvmem_config *config)
>> break;
>> }
>>
>> - if (rval) {
>> - ida_free(&nvmem_ida, nvmem->id);
>> - kfree(nvmem);
>> - return ERR_PTR(rval);
>> - }
>> + if (rval)
>> + goto err_gpiod_put;
>
> Why was gpiod changes added to this patch, that should be a separate
> patch/discussion, as this is not relevant to the issue that you are
> reporting.

Because freeing the device also does a gpiod_put in the destructor, so
doing this is correct in every other instance below and maintains
existing behavior, and it just so happens that this instance converges
into the same codepath so it is correct to merge it, and it just so
happens that the gpiod put was missing in this path to begin with so
this becomes a drive-by bugfix.

If you don't like it I can remove it (i.e. reintroduce the bug for no
good reason) and you can submit this fix yourself, because I have no
incentive to waste time submitting a separate patch to fix a GPIO leak
in an error path corner case in a subsystem I don't own and I have much
bigger things to spend my (increasingly lower and lower) willingness to
fight for upstream submissions than this.

Seriously, what is wrong with y'all kernel people. No other open source
project wastes contributors' time with stupid nitpicks like this. I
found a bug, I fixed it, I then fixed the issues you pointed out, and I
don't have the time nor energy to fight over this kind of nonsense next.
Do you want bugs fixed or not?

>>
>> nvmem->read_only = device_property_present(config->dev, "read-only") ||
>> config->read_only || !nvmem->reg_write;
>> @@ -837,20 +834,16 @@ struct nvmem_device *nvmem_register(const struct nvmem_config *config)
>>
>> dev_dbg(&nvmem->dev, "Registering nvmem device %s\n", config->name);
>>
>> - rval = device_register(&nvmem->dev);
>> - if (rval)
>> - goto err_put_device;
>> -
>> if (nvmem->nkeepout) {
>> rval = nvmem_validate_keepouts(nvmem);
>> if (rval)
>> - goto err_device_del;
>> + goto err_gpiod_put;
>> }
>>
>> if (config->compat) {
>> rval = nvmem_sysfs_setup_compat(nvmem, config);
>> if (rval)
>> - goto err_device_del;
>> + goto err_gpiod_put;
>> }
>>
>> if (config->cells) {
>> @@ -867,6 +860,15 @@ struct nvmem_device *nvmem_register(const struct nvmem_config *config)
>> if (rval)
>> goto err_remove_cells;
>>
>> + rval = device_register(&nvmem->dev);
>> + if (rval) {
>> + nvmem_device_remove_all_cells(nvmem);
>> + if (config->compat)
>> + nvmem_sysfs_remove_compat(nvmem, config);
>> + put_device(&nvmem->dev);
>> + return ERR_PTR(rval);
>> + }
>> +
>> blocking_notifier_call_chain(&nvmem_notifier, NVMEM_ADD, nvmem);
>>
>> return nvmem;
>> @@ -876,10 +878,10 @@ struct nvmem_device *nvmem_register(const struct nvmem_config *config)
>> err_teardown_compat:
>> if (config->compat)
>> nvmem_sysfs_remove_compat(nvmem, config);
>> -err_device_del:
>> - device_del(&nvmem->dev);
>> -err_put_device:
>> - put_device(&nvmem->dev);
>> +err_gpiod_put:
>> + gpiod_put(nvmem->wp_gpio);
>> + ida_free(&nvmem_ida, nvmem->id);
>> + kfree(nvmem);
>>
>> return ERR_PTR(rval);
>> }
>> --
>> 2.35.1
>>
>


- Hector