Re: [PATCH v12 13/16] iommu: Improve iopf_queue_remove_device()

From: Vasant Hegde
Date: Thu Feb 08 2024 - 00:07:14 EST




On 2/8/2024 7:02 AM, Baolu Lu wrote:
> On 2024/2/8 1:59, Vasant Hegde wrote:
>> Hi Baolu,
>>
>> On 2/7/2024 5:59 PM, Baolu Lu wrote:
>>> On 2024/2/7 10:50, Tian, Kevin wrote:
>>>>> From: Lu Baolu<baolu.lu@xxxxxxxxxxxxxxx>
>>>>> Sent: Wednesday, February 7, 2024 9:33 AM
>>>>>
>>>>> Convert iopf_queue_remove_device() to return void instead of an error code,
>>>>> as the return value is never used. This removal helper is designed to be
>>>>> never-failed, so there's no need for error handling.
>>>>>
>>>>> Ack all outstanding page requests from the device with the response code of
>>>>> IOMMU_PAGE_RESP_INVALID, indicating device should not attempt any retry.
>>>>>
>>>>> Add comments to this helper explaining the steps involved in removing a
>>>>> device from the iopf queue and disabling its PRI. The individual drivers
>>>>> are expected to be adjusted accordingly. Here we just define the expected
>>>>> behaviors of the individual iommu driver from the core's perspective.
>>>>>
>>>>> Suggested-by: Jason Gunthorpe<jgg@xxxxxxxxxx>
>>>>> Signed-off-by: Lu Baolu<baolu.lu@xxxxxxxxxxxxxxx>
>>>>> Reviewed-by: Jason Gunthorpe<jgg@xxxxxxxxxx>
>>>>> Tested-by: Yan Zhao<yan.y.zhao@xxxxxxxxx>
>>>> Reviewed-by: Kevin Tian<kevin.tian@xxxxxxxxx>, with one nit:
>>>>
>>>>> + * Removing a device from an iopf_queue. It's recommended to follow
>>>>> these
>>>>> + * steps when removing a device:
>>>>>     *
>>>>> - * Return: 0 on success and <0 on error.
>>>>> + * - Disable new PRI reception: Turn off PRI generation in the IOMMU
>>>>> hardware
>>>>> + *   and flush any hardware page request queues. This should be done
>>>>> before
>>>>> + *   calling into this helper.
>>>>> + * - Acknowledge all outstanding PRQs to the device: Respond to all
>>>>> outstanding
>>>>> + *   page requests with IOMMU_PAGE_RESP_INVALID, indicating the device
>>>>> should
>>>>> + *   not retry. This helper function handles this.
>>>> this implies calling iopf_queue_remove_device() here.
>>>>
>>>>> + * - Disable PRI on the device: After calling this helper, the caller could
>>>>> + *   then disable PRI on the device.
>>>>> + * - Call iopf_queue_remove_device(): Calling iopf_queue_remove_device()
>>>>> + *   essentially disassociates the device. The fault_param might still exist,
>>>>> + *   but iommu_page_response() will do nothing. The device fault parameter
>>>>> + *   reference count has been properly passed from
>>>>> iommu_report_device_fault()
>>>>> + *   to the fault handling work, and will eventually be released after
>>>>> + *   iommu_page_response().
>>>>>     */
>>>> but here it suggests calling iopf_queue_remove_device() again. If the comment
>>>> is just about to detail the behavior with that invocation shouldn't it be
>>>> merged
>>>> with the previous one instead of pretending to be the final step for driver
>>>> to call?
>>>
>>> Above just explains the behavior of calling iopf_queue_remove_device().
>>
>> Can you please leave a line -OR- move this to previous para? Otherwise we will
>> get confused.
>
> Sure. I will make it look like below.
>
> /**
>  * iopf_queue_remove_device - Remove producer from fault queue
>  * @queue: IOPF queue
>  * @dev: device to remove
>  *
>  * Removing a device from an iopf_queue. It's recommended to follow these
>  * steps when removing a device:
>  *
>  * - Disable new PRI reception: Turn off PRI generation in the IOMMU hardware
>  *   and flush any hardware page request queues. This should be done before
>  *   calling into this helper.
>  * - Acknowledge all outstanding PRQs to the device: Respond to all outstanding
>  *   page requests with IOMMU_PAGE_RESP_INVALID, indicating the device should
>  *   not retry. This helper function handles this.
>  * - Disable PRI on the device: After calling this helper, the caller could
>  *   then disable PRI on the device.
>  *
>  * Calling iopf_queue_remove_device() essentially disassociates the device.
>  * The fault_param might still exist, but iommu_page_response() will do
>  * nothing. The device fault parameter reference count has been properly
>  * passed from iommu_report_device_fault() to the fault handling work, and
>  * will eventually be released after iommu_page_response().
>  */

Looks good. Thank you.

-Vasant