Re: [PATCH v2 1/1] iommu/arm-smmu-v3: Fix error case of range command

From: zhurui
Date: Wed Aug 09 2023 - 05:22:16 EST


On 2023/8/9 0:43, Robin Murphy wrote:
> On 08/08/2023 5:24 pm, Will Deacon wrote:
>> Hi Robin,
>>
>> On Mon, Aug 07, 2023 at 08:20:45PM +0100, Robin Murphy wrote:
>>> On 2023-08-06 06:28, zhurui wrote:
>>>> On 2023/8/5 2:30, Nicolin Chen wrote:
>>>>> On Fri, Aug 04, 2023 at 05:52:25PM +0100, Will Deacon wrote:
>>>>>> On Fri, Aug 04, 2023 at 05:31:20PM +0800, zhurui wrote:
>>>>>>> When tg != 0 but ttl, scale, num all 0 in a range tlbi command, it
>>>>>>> is reserved and will cause the CERROR_ILL error. This case means
>>>>>>> that the size to be invalidated is only one page size, and the
>>>>>>> range invalidation is meaningless here. So we set tg to 0 in this
>>>>>>> case to do an non-range invalidation instead.
>>>>>
>>>>>>> @@ -1930,6 +1927,12 @@ static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
>>>>>>>                           num = (num_pages >> scale) & CMDQ_TLBI_RANGE_NUM_MAX;
>>>>>>>                           cmd->tlbi.num = num - 1;
>>>>>>>
>>>>>>> +                       /* Prevent error caused by one page tlbi with leaf 0 */
>>>>>>> +                       if (scale == 0 && num == 1 && cmd->tlbi.leaf == 0)
>>>>>>> +                               cmd->tlbi.tg = 0;
>>>>>>
>>>>>> This should only be true for the last iteration, right (i.e. when num_pages
>>>>>> == 1)? In which case, I'd prefer to leave the old code as-is and just add:
>>>>>>
>>>>>>           /* Single-page leaf invalidation requires a TG field of 0 */
>>>>>>           if (num_pages == 1 && !cmd->tlbi.leaf)
>>>>>>                   cmd->tlbi.tg = 0;To Will and Nicolin,
>>>>
>>>> Not only the last iteration, it's the result of __ffs function. For example, if
>>>> numpages is 33, then the value of __ffs(num_pages) is 0, so the value of scale
>>>> is also 0. The value of num depends on CMDQ_TLBI_RANGE_NUM_MAX. That is, the
>>>> maximum value of num is 31. Therefore, the final value of num is 1.
>>>> So, if consider CMDQ_TLBI_RANGE_NUM_MAX, there will be some case not the last
>>>> one page but the beginning pages. That's why I use scale and num as conditions,
>>>> not num_pages. Then I should reassign tg based on the result.
>>>
>>> Yeah, I'd rather not downgrade to a non-range invalidate since that
>>> complicates the reasoning for the errata affecting those. If the size of the
>>> invalidation is equal to TG then it can only represent a single last-level
>>> page, i.e. TTL=3, thus if it does warrant handling here then indeed
>>> rearranging to base the condition on num_pages as well ought to suffice.
>>> However, this is all still begging the question of where and why we're doing
>>> a *non-leaf* invalidation that isn't aligned to the size of a table, because
>>> that in itself doesn't make a whole heap of sense - my hunch is that that
>>> wants figuring out and could probably be fixed at the source.
>>
>> Isn't that described above because we're using CMDQ_TLBI_RANGE_NUM_MAX
>> to break up the range into separate commands?
>
> Not really, because if we're doing a genuine non-leaf invalidation of a table then it should be a block-aligned range that ought to fit in a single command and should certainly never involve a single-granule remainder. If we're doing non-leaf invalidations of things that logically don't need to be non-leaf, making them leaf would be the even better option.
>

I agree with Robin that if the caller is doing a genuine non-leaf invalidation
of a table, it should not involve a single-granule tlbi. It seems that the
caller only filter the block size, but not the address aligned or not maybe.

>> Do you mind if I queue the patch as-is for now? I don't think the driver
>> should be emitting illegal commands, and v2 of the patch does seem like
>> the obvious thing to do.
>
> TBH I'd rather you just drop my patch if it's proven problematic, and I'll take another crack at it soon. The potential problems we introduce by using non-range invalidates on errata-affected MMU-700 revisions are worse than the almost-entirely-theoretical one I was trying to address.
>

If you all agree to roll back the problematic code, is the first patch be OK?
Should I need to add some more descriptions to clarify this?

Thanks,
Zhurui.