Re: [PATCH] dax/hmem: Fix refcount leak in dax_hmem_probe()

From: Ira Weiny
Date: Fri Jun 02 2023 - 14:43:37 EST


Paul Cassella wrote:
> On Sat, 3 Dec 2022, Ira Weiny wrote:
> > On Sat, Dec 03, 2022 at 09:58:58AM +0000, Yongqiang Liu wrote:
>
> > > We should always call dax_region_put() whenever devm_create_dev_dax()
> > > succeed or fail to avoid refcount leak of dax_region. Move the return
> > > value check after dax_region_put().
>
> > I think dax_region_put is called from dax_region_unregister() automatically on
> > tear down.
>
> Hi Ira,
>
> Note the reference dax_region_unregister() will be putting is the one
> devm_create_dev_dax() takes by kref_get(&dax_region->kref). I think
> dax_hmem_probe() needs to put its reference in the error case, as in the
> successful case.

Looking at this again I'm inclined to agree that something is wrong. But
I'm not sure this patch fixes it. anything.

When you say:

> ... I think
> dax_hmem_probe() needs to put its reference in the error case, as in the
> successful case.

... which kref_get() is dax_hmem_probe() letting go?

Here is what I see with the current code.

dax_hmem_probe()
alloc_dax_region()
kref_get(&dax_region->kref)
devm_add_action_or_reset(... dax_region_unregister, ...)
... kref_put() later...

devm_create_dev_dax()
...may return error...
kref_get()
[dev_dax_release() set to call kref_put() later]
...may return error...

if not error
dax_region_put() => kref_put()

I think this is an extra unneeded put???

Dan I see this pattern repeated in cxl and pmem. Is the intent to remove
the need for dax_region_unregister() to be called when the platform device
unwinds because the platform device is not intended to own the dax_region
after success? If so it seems like the device managed call should be
removed too. Not just calling dax_region_put()? :-/

But wouldn't that cause an issue with the sysfs entries created?

>
> Consider, devm_create_dev_dax() has error paths that return without
> involving dax_region_unregister(), prior to kref_get() and device_add().
> dax_hmem_probe() is clearly responsible for freeing the region in those
> cases.

No the devm_add_action_or_reset(... dax_region_unregister, ...) will cause
dax_region_unregister() to release the reference when the platform device
unwinds.

devm_create_dev_dax() configures a reference release through the
dev_dax->type release. So when the dev_dax device is put the dax_region
will be released through dev_dax_release()->dax_region_put().

>
>
> dax_hmem_probe() drops its own reference in the successful case because
> (per the comment) "child dev_dax instances now own the lifetime of the
> dax_region". That ownership is the reference that the error-case
> dax_region_unregister() is dropping.

No dax_region_unregister() is not just an error case flow.

>
> dax_hmem_probe()'s initial reference
> also needs to be dropped in the error case, as in the successful case.
>

I don't follow this. Doesn't this now result in an invalid reference
release in dax_region_unregister() when the platform device is unwound?
Furthermore, that reference is required I think for the sysfs entries.

>
> > > Fixes: c01044cc8191 ("ACPI: HMAT: refactor hmat_register_target_device to hmem_register_device")
> >
> > I'm also not sure how this patch is related to this fix.
> >
> > Ira
> >
> > > Signed-off-by: Yongqiang Liu <liuyongqiang13@xxxxxxxxxx>
> > > ---
> > > drivers/dax/hmem/hmem.c | 5 ++---
> > > 1 file changed, 2 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
> > > index 1bf040dbc834..09f5cd7b6c8e 100644
> > > --- a/drivers/dax/hmem/hmem.c
> > > +++ b/drivers/dax/hmem/hmem.c
> > > @@ -36,12 +36,11 @@ static int dax_hmem_probe(struct platform_device *pdev)
> > > .size = region_idle ? 0 : resource_size(res),
> > > };
> > > dev_dax = devm_create_dev_dax(&data);
> > > - if (IS_ERR(dev_dax))
> > > - return PTR_ERR(dev_dax);
> > >
> > > /* child dev_dax instances now own the lifetime of the dax_region */
>
> This comment should probably be updated now. :)
>

I think removed...

Dan what do you think of this patch? Am I seriously off the rails here?
I worry about this code being around for so long. But since it is in tear
down perhaps it is just a race which has never been lost.

Ira

---- 8< ---------------------