Re: [PATCH v2 1/1] iommu/arm-smmu-v3: Fix error case of range command

From: Will Deacon
Date: Wed Aug 09 2023 - 07:23:27 EST


On Wed, Aug 09, 2023 at 05:22:06PM +0800, zhurui wrote:
> On 2023/8/9 0:43, Robin Murphy wrote:
> > On 08/08/2023 5:24 pm, Will Deacon wrote:
> >> On Mon, Aug 07, 2023 at 08:20:45PM +0100, Robin Murphy wrote:
> >>> Yeah, I'd rather not downgrade to a non-range invalidate since that
> >>> complicates the reasoning for the errata affecting those. If the size of the
> >>> invalidation is equal to TG then it can only represent a single last-level
> >>> page, i.e. TTL=3, thus if it does warrant handling here then indeed
> >>> rearranging to base the condition on num_pages as well ought to suffice.
> >>> However, this is all still begging the question of where and why we're doing
> >>> a *non-leaf* invalidation that isn't aligned to the size of a table, because
> >>> that in itself doesn't make a whole heap of sense - my hunch is that that
> >>> wants figuring out and could probably be fixed at the source.
> >>
> >> Isn't that described above because we're using CMDQ_TLBI_RANGE_NUM_MAX
> >> to break up the range into separate commands?
> >
> > Not really, because if we're doing a genuine non-leaf invalidation of a
> > table then it should be a block-aligned range that ought to fit in a
> > single command and should certainly never involve a single-granule
> > remainder. If we're doing non-leaf invalidations of things that
> > logically don't need to be non-leaf, making them leaf would be the even
> > better option.
> >
>
> I agree with Robin that if the caller is doing a genuine non-leaf invalidation
> of a table, it should not involve a single-granule tlbi. It seems that the
> caller only filter the block size, but not the address aligned or not maybe.

There's only one caller though, right? That's the
io_pgtable_tlb_flush_walk() call in io-pgtable-arm.c which shouldn't trigger
this problem.

Do you have a backtrace for the case when we generate the illegal command?

> >> Do you mind if I queue the patch as-is for now? I don't think the driver
> >> should be emitting illegal commands, and v2 of the patch does seem like
> >> the obvious thing to do.
> >
> > TBH I'd rather you just drop my patch if it's proven problematic, and
> > I'll take another crack at it soon. The potential problems we introduce
> > by using non-range invalidates on errata-affected MMU-700 revisions are
> > worse than the almost-entirely-theoretical one I was trying to address.
> >
>
> If you all agree to roll back the problematic code, is the first patch be OK?
> Should I need to add some more descriptions to clarify this?

I can just go and revert Robin's original patch, but I'd like to see your
backtrace first so that we understand how this is occurring.

Thanks,

Will