Re: [PATCH v6 3/5] iommu/arm-smmu: introduction of ACTLR for custom prefetcher settings

From: Dmitry Baryshkov
Date: Thu Dec 21 2023 - 05:34:05 EST


On Thu, 21 Dec 2023 at 12:02, Bibek Kumar Patro
<quic_bibekkum@xxxxxxxxxxx> wrote:
>
>
>
> On 12/21/2023 6:06 AM, Dmitry Baryshkov wrote:
> > On Wed, 20 Dec 2023 at 15:39, Bibek Kumar Patro
> > <quic_bibekkum@xxxxxxxxxxx> wrote:
> >>
> >> Currently in Qualcomm SoCs the default prefetch is set to 1 which allows
> >> the TLB to fetch just the next page table. MMU-500 features ACTLR
> >> register which is implementation defined and is used for Qualcomm SoCs
> >> to have a custom prefetch setting enabling TLB to prefetch the next set
> >> of page tables accordingly allowing for faster translations.
> >>
> >> ACTLR value is unique for each SMR (Stream matching register) and stored
> >> in a pre-populated table. This value is set to the register during
> >> context bank initialisation.
> >>
> >> Signed-off-by: Bibek Kumar Patro <quic_bibekkum@xxxxxxxxxxx>
> >> ---
> >> drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 45 ++++++++++++++++++++++
> >> drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h | 6 ++-
> >> drivers/iommu/arm/arm-smmu/arm-smmu.c | 5 ++-
> >> drivers/iommu/arm/arm-smmu/arm-smmu.h | 5 +++
> >> 4 files changed, 58 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> >> index 20c9836d859b..1cefdd0ca110 100644
> >> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> >> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> >> @@ -24,6 +24,12 @@
> >> #define CPRE (1 << 1)
> >> #define CMTLB (1 << 0)
> >>
> >> +struct actlr_config {
> >> + u16 sid;
> >> + u16 mask;
> >> + u32 actlr;
> >> +};
> >> +
> >> static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu)
> >> {
> >> return container_of(smmu, struct qcom_smmu, smmu);
> >> @@ -215,9 +221,38 @@ static bool qcom_adreno_can_do_ttbr1(struct arm_smmu_device *smmu)
> >> return true;
> >> }
> >>
> >> +static void qcom_smmu_set_actlr(struct device *dev, struct arm_smmu_device *smmu, int cbndx,
> >> + const struct actlr_config *actlrcfg)
> >> +{
> >> + struct arm_smmu_master_cfg *cfg = dev_iommu_priv_get(dev);
> >> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> >> + struct arm_smmu_smr *smr;
> >> + u16 mask;
> >> + int idx;
> >> + u16 id;
> >> + int i;
> >> +
> >> + for (; actlrcfg->sid || actlrcfg->mask || actlrcfg->actlr; actlrcfg++) {
> >> + id = actlrcfg->sid;
> >> + mask = actlrcfg->mask;
> >> +
> >> + for_each_cfg_sme(cfg, fwspec, i, idx) {
> >> + smr = &smmu->smrs[idx];
> >> + if (smr_is_subset(smr, id, mask)) {
> >> + arm_smmu_cb_write(smmu, cbndx, ARM_SMMU_CB_ACTLR,
> >> + actlrcfg->actlr);
> >> + break;
> >> + }
> >> + }
> >> + }
> >> +}
> >> +
> >> static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
> >> struct io_pgtable_cfg *pgtbl_cfg, struct device *dev)
> >> {
> >> + struct arm_smmu_device *smmu = smmu_domain->smmu;
> >> + struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
> >> + int cbndx = smmu_domain->cfg.cbndx;
> >> struct adreno_smmu_priv *priv;
> >>
> >> smmu_domain->cfg.flush_walk_prefer_tlbiasid = true;
> >> @@ -248,6 +283,9 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
> >> priv->set_stall = qcom_adreno_smmu_set_stall;
> >> priv->resume_translation = qcom_adreno_smmu_resume_translation;
> >>
> >> + if (qsmmu->data->actlrcfg_gfx)
> >> + qcom_smmu_set_actlr(dev, smmu, cbndx, qsmmu->data->actlrcfg_gfx);
> >
> > There was a feedback point against v4 that there can be more than two
> > (apps + gpu) SMMU devices. No, we can not use additional compat
> > strings, the SMMU units are compatible with each other.
>
> Just to understand better, did you mean if in the below check
> [inside arm-smmu-qcom.c file during qcom_smmu_create()], "else" has two
> things? (Currently adreno_impl for gpu smmu, else for only
> apps smmu)

qcom,adreno-smmu is quite unique here, this is the only distinctive
substring. We do not have such compat strings for any other of SMMU
nodes.

>
> if (np && of_device_is_compatible(np, "qcom,adreno-smmu"))
> impl = data->adreno_impl;
> else
> impl = data->impl;
>
> > Please add
> > matching between the smmu and particular actlr table using the IO
> > address of the SMMU block.
> >
>
> The ACTLR table for each smmu will have A IO address attached, so based
> on IO address we can apply ACTLR.
> Is this your proposal((IMO hardcoding IO in driver won't be viable,
> isn't it?), or in smmu DT we would need to set the IO?

Unfortunately, I meant exactly that: hardcoding addresses of the SMMU
register spaces. see drivers/gpu/drm/msm/dsi_cfg.c
Then during device probe the driver can match the IO address to the
list of the per-platform ACTLR tables and select the correct one.
Then you don't even need a special actlrcfg_gfx. The GFX will fall
into the main schema.

>
>
> Thanks & regards,
> Bibek
>
> >> +
> >> return 0;
> >> }
> >>
> >> @@ -274,6 +312,13 @@ static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = {
> >> static int qcom_smmu_init_context(struct arm_smmu_domain *smmu_domain,
> >> struct io_pgtable_cfg *pgtbl_cfg, struct device *dev)
> >> {
> >> + struct arm_smmu_device *smmu = smmu_domain->smmu;
> >> + struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
> >> + int cbndx = smmu_domain->cfg.cbndx;
> >> +
> >> + if (qsmmu->data->actlrcfg)
> >> + qcom_smmu_set_actlr(dev, smmu, cbndx, qsmmu->data->actlrcfg);
> >> +
> >
> > One issue occured to me, while I was reviewing the patchset. The ACTLR
> > settings are related to the whole SMMU setup, but we are applying them
> > each time there is an SMMU context init (in other words, one per each
> > domain). Is that correct? Or it's just that there is no better place
> > for initialising the global register set? Would it be better to
> > reprogram the ACTLR registers which are related just to this
> > particular domain?
> >
> >> smmu_domain->cfg.flush_walk_prefer_tlbiasid = true;
> >>
> >> return 0;
> >> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h
> >> index f3b91963e234..cb4cb402c202 100644
> >> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h
> >> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h
> >> @@ -1,6 +1,6 @@
> >> /* SPDX-License-Identifier: GPL-2.0-only */
> >> /*
> >> - * Copyright (c) 2022, Qualcomm Innovation Center, Inc. All rights reserved.
> >> + * Copyright (c) 2023, Qualcomm Innovation Center, Inc. All rights reserved.
> >> */
> >>
> >> #ifndef _ARM_SMMU_QCOM_H
> >> @@ -24,7 +24,11 @@ struct qcom_smmu_config {
> >> const u32 *reg_offset;
> >> };
> >>
> >> +struct actlr_config;
> >> +
> >> struct qcom_smmu_match_data {
> >> + const struct actlr_config *actlrcfg;
> >> + const struct actlr_config *actlrcfg_gfx;
> >> const struct qcom_smmu_config *cfg;
> >> const struct arm_smmu_impl *impl;
> >> const struct arm_smmu_impl *adreno_impl;
> >> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> >> index d6d1a2a55cc0..0c7f700b27dd 100644
> >> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> >> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> >> @@ -990,9 +990,10 @@ static int arm_smmu_find_sme(struct arm_smmu_device *smmu, u16 id, u16 mask)
> >> * expect simply identical entries for this case, but there's
> >> * no harm in accommodating the generalisation.
> >> */
> >> - if ((mask & smrs[i].mask) == mask &&
> >> - !((id ^ smrs[i].id) & ~smrs[i].mask))
> >> +
> >> + if (smr_is_subset(&smrs[i], id, mask))
> >> return i;
> >> +
> >> /*
> >> * If the new entry has any other overlap with an existing one,
> >> * though, then there always exists at least one stream ID
> >> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> >> index 703fd5817ec1..2e4f65412c6b 100644
> >> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
> >> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> >> @@ -501,6 +501,11 @@ static inline void arm_smmu_writeq(struct arm_smmu_device *smmu, int page,
> >> writeq_relaxed(val, arm_smmu_page(smmu, page) + offset);
> >> }
> >>
> >> +static inline bool smr_is_subset(struct arm_smmu_smr *smrs, u16 id, u16 mask)
> >> +{
> >> + return (mask & smrs->mask) == mask && !((id ^ smrs->id) & ~smrs->mask);
> >> +}
> >> +
> >> #define ARM_SMMU_GR0 0
> >> #define ARM_SMMU_GR1 1
> >> #define ARM_SMMU_CB(s, n) ((s)->numpage + (n))
> >> --
> >> 2.17.1
> >>
> >
> >



--
With best wishes
Dmitry