Re: [PATCH v6 4/5] iommu/mediatek: Add mt8173 IOMMU driver

From: Robin Murphy
Date: Wed Dec 16 2015 - 07:48:51 EST


On 16/12/15 05:59, Yong Wu wrote:
On Tue, 2015-12-15 at 12:37 +0000, Robin Murphy wrote:
On 15/12/15 03:28, Yong Wu wrote:
On Mon, 2015-12-14 at 15:16 +0100, Joerg Roedel wrote:
On Tue, Dec 08, 2015 at 05:49:12PM +0800, Yong Wu wrote:
+static int mtk_iommu_attach_device(struct iommu_domain *domain,
+ struct device *dev)
+{
+ struct mtk_iommu_domain *dom = to_mtk_domain(domain);
+ struct mtk_iommu_client_priv *priv = dev->archdata.iommu;
+ struct mtk_iommu_data *data;
+ int ret;
+
+ if (!priv)
+ return -ENODEV;
+
+ data = dev_get_drvdata(priv->m4udev);
+ if (!data) {
+ /*
+ * The DMA core will run earlier than this probe, and it will
+ * create a default iommu domain for each a iommu device.
+ * But here there is only one domain called the m4u domain
+ * which all the multimedia HW share.
+ * The default domain isn't needed here.
+ */

The iommu core creates one domain per iommu-group. In your case this
means one default domain per iommu in the system.

Yes. The iommu core will create one domain per iommu-group.
see the next "if" here.

But the domain here is created by the current DMA64. It's from this
function do_iommu_attach which will be called too early and will help
create a default domain for each a iommu device.(my codebase is
v4.4-rc1).

I still don't really understand the problem here. The M4U device is
created early in mtk_iommu_init_fn, so it should be probed and ready to
go long before anything else. Indeed, for your mtk_iommu_device_group()
callback to work then everything must already be happening in the right
order - add_device will only call iommu_group_get_for_dev() if of_xlate
has populated dev->archdata.iommu, and of_xlate will only do that if the
M4U was up and running before the client device started probing.
Furthermore, if mtk_iommu_device_group() *does* work, then the IOMMU
core will go ahead and allocate the default domain there and then, which
the arch code should find and use later.

Thanks. This is very helpful.

I understand your confuse right now and your expectant flow.

Our IOMMU probe was PROBE_DEFER by our SMI device, so currently it probe
was delayed, then have to add the workaround code.

Aha, I hadn't twigged that there was a dependency on another device that could delay the M4U probe, thanks for the clarification. That'll be a good case to bear in mind when I eventually get back to the IOMMU probe deferral stuff.

Following your comment above, I test as below. Then the flows seems meet
the "best case" that the iommu core will help create default DMA domain.

@@ -664,19 +636,41 @@ static int mtk_iommu_probe(struct platform_device
*pdev)
for (i = 0; i < larb_nr; i++) {
struct device_node *larbnode;
struct platform_device *plarbdev;

larbnode = of_parse_phandle(dev->of_node, "mediatek,larbs", i);
if (!larbnode)
return -EINVAL;

plarbdev = of_find_device_by_node(larbnode);
of_node_put(larbnode);
- if (!plarbdev)
- return -EPROBE_DEFER;
+ if (!plarbdev) {
+ plarbdev = of_platform_device_create(larbnode,
NULL, platform_bus_type.dev_root);
+ if (IS_ERR(pdev))
+ return -EPROBE_DEFER;
+ }
}

I only add of_platform_device_create for the SMI local arbiter devices
here.

This is a big improvement for us. If this is ok, I will send a quick
next version for this.

In my opinion it's reasonable - we need the whole "IOMMU" to be ready, so if we already have to short-cut the creation of the M4U part it only seems fair to do the same for the SMI part. That said, would it work to just unconditionally poke the larbs in mtk_iommu_init_fn() before you poke the M4U itself? It would be nice to keep all that stuff together in the same place.

The potential issue I *do* see, looking more closely, is that
iommu_group_get_for_dev() is setting group->domain but not calling the
attach_dev callback, which looks wrong...

This is the backtrace,

(151216_09:58:05.207)Call trace:
(151216_09:58:05.207)[<ffffffc000400668>] mtk_iommu_attach_device
+0xb8/0x178
(151216_09:58:05.207)[<ffffffc0003fc55c>] iommu_group_add_device
+0x1d8/0x31c
(151216_09:58:05.207)[<ffffffc0003fc988>] iommu_group_get_for_dev
+0x88/0x108
(151216_09:58:05.207)[<ffffffc0003ffcfc>] mtk_iommu_add_device+0x14/0x34
(151216_09:58:05.207)[<ffffffc0003fb280>] add_iommu_group+0x20/0x44
(151216_09:58:05.207)[<ffffffc000406cec>] bus_for_each_dev+0x58/0x98
(151216_09:58:05.207)[<ffffffc0003fbe8c>] bus_set_iommu+0x9c/0xf8

Urgh, now I recall that this isn't even the first time I've been confused by the attach being hidden elsewhere. Oh well, problem averted!

If I change like above, I will delete the workaround code..



//=====the next "if"===========
} else if (!data->m4u_dom) {
/*
* While a device is added into a iommu group, the iommu core
* will create a default domain for each a iommu group.
* This default domain is reserved as the m4u domain and is
* initiated here.
*/
data->m4u_dom = dom;
if (domain->type == IOMMU_DOMAIN_DMA) {
ret = iommu_dma_init_domain(domain, 0,
DMA_BIT_MASK(32));
if (ret)
goto err_uninit_dom;
}

ret = mtk_iommu_domain_finalise(data);
if (ret)
goto err_uninit_dom;
}
//======================


+ iommu_domain_free(domain);

This function is not supposed to free the domain passed to it.

As above this domain is created in the do_iommu_attach which will help
create a default domain for each a iommu device.
We don't need this default domain!

If we don't free it here, there will be a memory leak.

From Robin's comment, He will improve the sequence of the
__iommu_setup_dma_ops in the future.

That already happened. The final version of the arm64 code which was
merged makes sure that the IOMMU driver always sees the callbacks in the
desired of_xlate -> add_device -> attach_dev order. The whole point of
the comment below is that the driver itself *doesn't* have to care about
the awkward way in which that is currently achieved.

/*
* TODO: Right now __iommu_setup_dma_ops() gets called too early to do
* everything it needs to - the device is only partially created and the
* IOMMU driver hasn't seen it yet, so it can't have a group. Thus we
* need this delayed attachment dance. Once IOMMU probe ordering is
sorted
* to move the arch_setup_dma_ops() call later, all the notifier bits
below
* become unnecessary, and will go away.
*/

/*
* Best case: The device is either part of a group which was
* already attached to a domain in a previous call, or it's
* been put in a default DMA domain by the IOMMU core.
*/

That was before Joerg made the device_group changes which enabled proper
default domains for platform devices - with those, we should be now be
hitting the "best case" behaviour every time. In fact I think the "fake
default domain" workaround shouldn't be needed at all any more, so I
will add removing it to my giant list of things to do.

But there is no this patch currently, so I add iommu_domain_free
here.

"free the domain" here looks really not good. Then I delete the
iommu_domain_free here(allow this memory leak right now), is it ok?
(It will also works after Robin's change in the future.)


+static int mtk_iommu_add_device(struct device *dev)
+{
+ struct iommu_group *group;
+
+ if (!dev->archdata.iommu) /* Not a iommu client device */
+ return -ENODEV;
+
+ group = iommu_group_get_for_dev(dev);
+ if (IS_ERR(group))
+ return PTR_ERR(group);
+
+ iommu_group_put(group);
+ return 0;
+}

[...]

+static struct iommu_group *mtk_iommu_device_group(struct device *dev)
+{
+ struct mtk_iommu_data *data;
+ struct mtk_iommu_client_priv *priv;
+
+ priv = dev->archdata.iommu;
+ if (!priv)
+ return ERR_PTR(-ENODEV);
+
+ /* All the client devices are in the same m4u iommu-group */
+ data = dev_get_drvdata(priv->m4udev);
+ if (!data->m4u_group) {
+ data->m4u_group = iommu_group_alloc();
+ if (IS_ERR(data->m4u_group))
+ dev_err(dev, "Failed to allocate M4U IOMMU group\n");
+ }
+ return data->m4u_group;
+}

As long as this works as expected, then AFAICS you shouldn't have to
have *any* special-case behaviour or tracking of domains at all.

We only need one iommu-group, one iommu domain here.

What's the special-case behavior, how can we track of domains.
Could you help give me a example?

The beauty of it is that you don't need to. If you guarantee all of an IOMMU's client devices are in the same group, you know you've only got one thing which can be attached to that IOMMU's domains. Therefore, you can freely allow as many domains as you like to *exist*, because there can never be more than one *active* at any given time - the core code enforces that the group is detached from one domain before being attached to another, and the driver's attach and detach calls just become responsible for switching the given domain's page table in and out of the actual hardware. I think it's pretty neat.

Robin.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/