Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

From: Rob Clark
Date: Mon Nov 27 2017 - 18:43:36 EST


On Mon, Nov 27, 2017 at 5:22 PM, Stephen Boyd <sboyd@xxxxxxxxxxxxxx> wrote:
> On 11/15, Vivek Gautam wrote:
>> Hi,
>>
>>
>> On Mon, Aug 7, 2017 at 5:59 PM, Rob Clark <robdclark@xxxxxxxxx> wrote:
>> > On Mon, Aug 7, 2017 at 4:27 AM, Vivek Gautam
>> > <vivek.gautam@xxxxxxxxxxxxxx> wrote:
>> >> On Thu, Jul 13, 2017 at 5:20 PM, Rob Clark <robdclark@xxxxxxxxx> wrote:
>> >>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R <sricharan@xxxxxxxxxxxxxx> wrote:
>> >>>> Hi Vivek,
>> >>>>
>> >>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> >>>>> Hi Stephen,
>> >>>>>
>> >>>>>
>> >>>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>> >>>>>> On 07/06, Vivek Gautam wrote:
>> >>>>>>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>> >>>>>>> static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>> >>>>>>> size_t size)
>> >>>>>>> {
>> >>>>>>> - struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> >>>>>>> + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> >>>>>>> + struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> >>>>>>> + size_t ret;
>> >>>>>>> if (!ops)
>> >>>>>>> return 0;
>> >>>>>>> - return ops->unmap(ops, iova, size);
>> >>>>>>> + pm_runtime_get_sync(smmu_domain->smmu->dev);
>> >>>>>> Can these map/unmap ops be called from an atomic context? I seem
>> >>>>>> to recall that being a problem before.
>> >>>>>
>> >>>>> That's something which was dropped in the following patch merged in master:
>> >>>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>> >>>>>
>> >>>>> Looks like we don't need locks here anymore?
>> >>>>
>> >>>> Apart from the locking, wonder why a explicit pm_runtime is needed
>> >>>> from unmap. Somehow looks like some path in the master using that
>> >>>> should have enabled the pm ?
>> >>>>
>> >>>
>> >>> Yes, there are a bunch of scenarios where unmap can happen with
>> >>> disabled master (but not in atomic context).
>> >>
>> >> I would like to understand whether there is a situation where an unmap is
>> >> called in atomic context without an enabled master?
>> >>
>> >> Let's say we have the case where all the unmap calls in atomic context happen
>> >> only from the master's context (in which case the device link should
>> >> take care of
>> >> the pm state of smmu), and the only unmap that happen in non-atomic context
>> >> is the one with master disabled. In such a case doesn it make sense to
>> >> distinguish
>> >> the atomic/non-atomic context and add pm_runtime_get_sync()/put_sync() only
>> >> for the non-atomic context since that would be the one with master disabled.
>> >>
>> >
>> > At least drm/msm needs to hold obj->lock (a mutex) in unmap, so it
>> > won't unmap anything in atomic ctx (but it can unmap w/ master
>> > disabled). I can't really comment about other non-gpu drivers. It
>> > seems like a reasonable constraint that either master is enabled or
>> > not in atomic ctx.
>> >
>> > Currently we actually wrap unmap w/ pm_runtime_get/put_sync(), but I'd
>> > like to drop that to avoid powering up the gpu.
>>
>> Since the deferring the TLB maintenance doesn't look like the best approach [1],
>> how about if we try to power-up only the smmu from different client
>> devices such as,
>> GPU in the unmap path. Then we won't need to add pm_runtime_get/put() calls in
>> arm_smmu_unmap().
>>
>> The client device can use something like - pm_runtime_get_supplier() since
>> we already have the device link in place with this patch series. This should
>> power-on the supplier (which is smmu) without turning on the consumer
>> (such as GPU).
>>
>> pm_runtime_get_supplier() however is not exported at this moment.
>> Will it be useful to export this API and use it in the drivers.
>>
>
> I'm not sure pm_runtime_get_supplier() is correct either. That
> feels like we're relying on the GPU driver knowing the internal
> details of how the device links are configured.
>

what does pm_runtime_get_supplier() do if IOMMU driver hasn't setup
device-link? If it is a no-op, then I guess the GPU driver calling
pm_runtime_get_supplier() seems reasonable, and less annoying than
having special cases in pm_resume path.. I don't feel too bad about
having "just in case" get/put_supplier() calls in the unmap path.

Also, presumably we still want to avoid powering up GPU even if we
short circuit the firmware loading and rest of "booting up the GPU"..
since presumably the GPU draws somewhat more power than the IOMMU..
having the pm_resume/suspend path know about the diff between waking
up / suspending the iommu and itself doesn't really feel less-bad than
just doing "just in case" get/put_supplier() calls.

BR,
-R

> Is there some way to have the GPU driver know in its runtime PM
> resume hook that it doesn't need to be powered on because it
> isn't actively drawing anything or processing commands? I'm
> thinking of the code calling pm_runtime_get() as proposed around
> the IOMMU unmap path in the GPU driver and then having the
> runtime PM resume hook in the GPU driver return some special
> value to indicate that it didn't really resume because it didn't
> need to and to treat the device as runtime suspended but not
> return an error. Then the runtime PM core can keep track of that
> and try to power the GPU on again when another pm_runtime_get()
> is called on the GPU device.
>
> This keeps the consumer API the same, always pm_runtime_get(),
> but leaves the device driver logic of what to do when the GPU
> doesn't need to power on to the runtime PM hook where the driver
> has all the information.
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project