Re: [PATCH v4 2/3] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

From: Nicolin Chen
Date: Thu Oct 08 2020 - 17:19:16 EST


On Thu, Oct 08, 2020 at 11:53:43AM +0200, Thierry Reding wrote:
> On Mon, Oct 05, 2020 at 06:05:46PM -0700, Nicolin Chen wrote:
> > On Mon, Oct 05, 2020 at 11:57:54AM +0200, Thierry Reding wrote:
> > > On Fri, Oct 02, 2020 at 11:58:29AM -0700, Nicolin Chen wrote:
> > > > On Fri, Oct 02, 2020 at 06:02:18PM +0300, Dmitry Osipenko wrote:
> > > > > 02.10.2020 09:08, Nicolin Chen пишет:
> > > > > > static int tegra_smmu_of_xlate(struct device *dev,
> > > > > > struct of_phandle_args *args)
> > > > > > {
> > > > > > + struct platform_device *iommu_pdev = of_find_device_by_node(args->np);
> > > > > > + struct tegra_mc *mc = platform_get_drvdata(iommu_pdev);
> > > > > > u32 id = args->args[0];
> > > > > >
> > > > > > + of_node_put(args->np);
> > > > >
> > > > > of_find_device_by_node() takes device reference and not the np
> > > > > reference. This is a bug, please remove of_node_put().
> > > >
> > > > Looks like so. Replacing it with put_device(&iommu_pdev->dev);
> > >
> > > Putting the put_device() here is wrong, though. You need to make sure
> > > you keep a reference to it as long as you keep accessing the data that
> > > is owned by it.
> >
> > I am confused. You said in the other reply (to Dmitry) that we do
> > need to put_device(mc->dev), where mc->dev should be the same as
> > iommu_pdev->dev. But here your comments sounds that we should not
> > put_device at all since ->probe_device/group_device/attach_dev()
> > will use it later.
>
> You need to call put_device() at some point to release the reference
> that you acquired by calling of_find_device_by_node(). If you don't
> release it, you're leaking the reference and the kernel isn't going to
> know when it's safe to delete the device.
>
> So what I'm saying is that we either release it here, which isn't quite
> right because we do reference data relating to the device later on. And

I see. A small question here by the way: By looking at other IOMMU
drivers that are calling driver_find_device_by_fwnode() function,
I found that most of them put_device right after the function call,
and dev_get_drvdata() after putting the device..

Feels like they are doing it wrongly?

> because it isn't quite right there should be a reason to justify it,
> which is that the SMMU parent device is the same as the MC, so the
> reference count isn't strictly necessary. But that's not quite obvious,
> so highlighting it in a comment makes sense.
>
> The other alternative is to not call put_device() here and keep on to
> the reference as long as you keep using "mc". This might be difficult to
> implement because it may not be obvious where to release it. I think
> this is the better alternative, but if it's too complicated to implement
> it might not be worth it.

I feel so too. The dev is got at of_xlate() that does not have an
obvious counterpart function. So I'll just remove put_device() and
put a line of comments, as you suggested.

> > > Like I said earlier, this is a bit weird in this case because we're
> > > self-referencing, so iommu_pdev->dev is going to stay around as long as
> > > the SMMU is. However, it might be worth to properly track the lifetime
> > > anyway just so that the code can serve as a good example of how to do
> > > things.
> >
> > What's this "track-the-lifetime"?
>
> This basically just means that SMMU needs to ensure that MC stays alive
> (by holding a reference to it) as long as SMMU uses it. If the last
> reference to MC is dropped, then the mc pointer and potentially anything
> that it points to will become dangling. If you were to drop the last
> reference at this point, then on the next line the mc pointer could
> already be invalid.
>
> That's how it generally works, anyway. What's special about this use-
> case is that the SMMU and MC are the same device, so it should be safe
> to omit this additional tracking because the IOMMU tracking should take
> care of that already.

Okay.

> > > If you decide to go for the shortcut and not track this reference
> > > properly, then at least you need to add a comment as to why it is safe
> > > to do in this case. This ensures that readers are away of the
> > > circumstances and don't copy this bad code into a context where the
> > > circumstances are different.
> >
> > I don't quite get this "shortcut" here either...mind elaborating?
>
> The shortcut is taking advantage of the knowledge that the SMMU and the
> MC are the same device and therefore not properly track the MC object.
> Given that their code is located in different locations, this isn't
> obvious to the casual reader of the code, so they may assume that this
> is the normal way to do things. To avoid that, the code should have a
> comment explaining why that is.

Got it. Thanks!