Re: [PATCH v4 5/5] iommu/s390: Fix incorrect pgsize_bitmap

From: Robin Murphy
Date: Wed Oct 05 2022 - 05:54:04 EST


On 2022-10-04 17:13, Niklas Schnelle wrote:
On Tue, 2022-10-04 at 16:31 +0100, Robin Murphy wrote:
On 2022-10-04 16:12, Matthew Rosato wrote:
On 10/4/22 11:02 AM, Robin Murphy wrote:
On 2022-10-04 13:07, Niklas Schnelle wrote:
The .pgsize_bitmap property of struct iommu_ops is not a page mask but
rather has a bit set for each size of pages the IOMMU supports. As the
comment correctly pointed out at this moment the code only support 4K
pages so simply use SZ_4K here.

Unless it's already been done somewhere else, you'll want to switch over to the {map,unmap}_pages() interfaces as well to avoid taking a hit on efficiency here. The "page mask" thing was an old hack to trick the core API into making fewer map/unmap calls where the driver could map arbitrary numbers of pages at once anyway. The multi-page interfaces now do that more honestly and generally better (since they work for non-power-of-two sizes as well).

Thanks for the heads up -- Niklas has some additional series coming soon as described here:

https://lore.kernel.org/linux-iommu/a10424adbe01a0fd40372cbd0736d11e517951a1.camel@xxxxxxxxxxxxx/

So implementing the _pages() interfaces is soon up on the roadmap. But given what you say I wonder if this patch should just wait until the series that implements {map,unmap}_pages().

Perhaps, although the full change should be trivial enough that there's
probably just as much argument for doing the whole thing in its own
right for the sake of this cleanup. The main point is that
S390_IOMMU_PGSIZES is not incorrect as such, it's just not spelling out
the deliberate trick that it's achieving - everyone copied it from
intel-iommu, but since that got converted to the new interfaces the
original explanation is now gone. The only effect of "fixing" it in
isolation right now will be to make large VFIO mappings slower.

Robin.

The patch changing to map_pages()/unmap_pages() is currently part of a
larger series of improvements, some of which are less trivial. So I'm
planning to send those as RFC first. Those include changing the
spin_lock protected list to RCU so the map/unmap can paralellize
better. Another one is atomic updates to the IOMMU tables to do away
with locks in map/unmap. So I think pulling that whole
series into this one isn't ideal. I could pull just the
map_pages()/unmap_pages() change though.

Yeah, literally just updating the s390_iommu_{map,unmap} function prototypes and replacing "size" with "pgsize * count" within is all that's needed to clean up this hack properly. That can (and probably should) be completely independent of other improvements deeper down.

Thanks,
Robin.



Reviewed-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
Signed-off-by: Niklas Schnelle <schnelle@xxxxxxxxxxxxx>
---
drivers/iommu/s390-iommu.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index 94c444b909bd..6bf23e7830a2 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -12,13 +12,6 @@
#include <linux/sizes.h>
#include <asm/pci_dma.h>
-/*
- * Physically contiguous memory regions can be mapped with 4 KiB alignment,
- * we allow all page sizes that are an order of 4KiB (no special large page
- * support so far).
- */
-#define S390_IOMMU_PGSIZES (~0xFFFUL)
-
static const struct iommu_ops s390_iommu_ops;
struct s390_domain {
@@ -350,7 +343,7 @@ static const struct iommu_ops s390_iommu_ops = {
.probe_device = s390_iommu_probe_device,
.release_device = s390_iommu_release_device,
.device_group = generic_device_group,
- .pgsize_bitmap = S390_IOMMU_PGSIZES,
+ .pgsize_bitmap = SZ_4K,
.get_resv_regions = s390_iommu_get_resv_regions,
.default_domain_ops = &(const struct iommu_domain_ops) {
.attach_dev = s390_iommu_attach_device,