Re: [PATCH v2] mmc: mtk-sd: reduce CIT for better performance

From: Wenbin Mei (梅文彬)
Date: Sun Jun 04 2023 - 21:38:35 EST


On Thu, 2023-06-01 at 14:21 +0200, AngeloGioacchino Del Regno wrote:
>
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
> Il 01/06/23 14:08, Wenbin Mei (梅文彬) ha scritto:
> > On Thu, 2023-06-01 at 12:00 +0200, AngeloGioacchino Del Regno
> wrote:
> >>
> >> External email : Please do not click links or open attachments
> until
> >> you have verified the sender or the content.
> >> Il 01/06/23 05:16, Wenbin Mei (梅文彬) ha scritto:
> >>> On Wed, 2023-05-31 at 10:18 +0200, AngeloGioacchino Del Regno
> >> wrote:
> >>> External email : Please do not click links or open attachments
> >> until you have verified the sender or the content.
> >>>
> >>> Il 31/05/23 09:32, Wenbin Mei (梅文彬) ha scritto:
> >>>
> >>>> On Thu, 2023-05-18 at 11:13 +0200, AngeloGioacchino Del Regno
> >> wrote:
> >>>
> >>>>> External email : Please do not click links or open attachments
> >> until
> >>>
> >>>>> you have verified the sender or the content.
> >>>
> >>>>>
> >>>
> >>>>>
> >>>
> >>>>> Il 10/05/23 03:58, Wenbin Mei ha scritto:
> >>>
> >>>>>> CQHCI_SSC1 indicates to CQE the polling period to use when
> using
> >>>
> >>>>>> periodic
> >>>
> >>>>>> SEND_QUEUE_STATUS(CMD13) polling.
> >>>
> >>>>>> The default value 0x1000 that corresponds to 150us, let's
> >> decrease
> >>>
> >>>>>> it to
> >>>
> >>>>>
> >>>
> >>>>> The default value 0x1000 (4096) corresponds to 4096 * 52.08uS =
> >>>
> >>>>> 231.33uS
> >>>
> >>>>> ...so the default is not 150uS.
> >>>
> >>>>>
> >>>
> >>>>> If I'm wrong, this means that the CQCAP field is not 0, which
> >> would
> >>>
> >>>>> mean
> >>>
> >>>>> that the expected 3uS would be wrong.
> >>>
> >>>>>
> >>>
> >>>>> Also, since the calculation can be done dynamically, this is
> what
> >> we
> >>>
> >>>>> should
> >>>
> >>>>> actually do in the driver, as this gives information to the
> next
> >>>
> >>>>> engineer
> >>>
> >>>>> checking this piece of code.
> >>>
> >>>>>
> >>>
> >>>>> Apart from this, by just writing 0x40 to the CQHCI_SSC1
> register,
> >> you
> >>>
> >>>>> are
> >>>
> >>>>> assuming that the CQCAP value requirement is fullfilled, but
> you
> >>>
> >>>>> cannot
> >>>
> >>>>> assume that the bootloader has set the CQCAP's ITCFVAL and
> >> ITCFMUL
> >>>
> >>>>> fields
> >>>
> >>>>> as you expect on all platforms: this means that implementing
> this
> >>>
> >>>>> takes
> >>>
> >>>>> a little more effort.
> >>>
> >>>>>
> >>>
> >>>>> You have two ways to implement this:
> >>>
> >>>>> *** First ***
> >>>
> >>>>> 1. Read ITCFMUL and ITCFVAL, then:
> >>>
> >>>>> tclk_mul = itcfmul_to_mhz(ITCFMUL); /* pseudo function
> >>>
> >>>>> interprets reg value*/
> >>>
> >>>>> tclk = ITCFVAL * tclk_mul;
> >>>
> >>>>>
> >>>
> >>>>> 2. Set SSC1 so that we get 3nS:
> >>>
> >>>>> #define CQHCI_SSC1_CIT GENMASK(15, 0)
> >>>
> >>>>> poll_time = cit_time_ns_to_regval(3);
> >>>
> >>>>> sscit = FIELD_PREP(CQHCI_SSC1_CIT, poll_time)
> >>>
> >>>>> cqhci_writel( ... )
> >>>
> >>>>>
> >>>
> >>>>> *** Second **
> >>>
> >>>>>
> >>>
> >>>>> 1. Pre-set ITCFMUL and ITCFVAL to
> >>>
> >>>>> ITCFVAL = 192 (decimal)
> >>>
> >>>>> ITCFMUL = 2 (where 2 == 0.1MHz)
> >>>
> >>>>>
> >>>
> >>>>> 2. Set SSC1 so that we get 3nS:
> >>>
> >>>>> #define CQHCI_SSC1_CIT GENMASK(15, 0)
> >>>
> >>>>> poll_time = cit_time_ns_to_regval(3);
> >>>
> >>>>> sscit = FIELD_PREP(CQHCI_SSC1_CIT, poll_time)
> >>>
> >>>>> cqhci_writel( ... )
> >>>
> >>>>>
> >>>
> >>>>> I would implement the first way, as it paves the way to extend
> >> this
> >>>
> >>>>> to different
> >>>
> >>>>> tclk values if needed in the future.
> >>>
> >>>>>
> >>>
> >>>>> Regards,
> >>>
> >>>>> Angelo
> >>>
> >>>> Hi Angelo,
> >>>
> >>>>
> >>>
> >>>> Sorry for lately reply.
> >>>
> >>>>
> >>>
> >>>> For Mediatek mmc host IP, ITCFMUL is 0x2(0x1MHz), ITVFVAL
> reports
> >> 182,
> >>>
> >>>> and these fields are the same and are readonly for all IC, but
> >> since
> >>>
> >>>> Mediatek CQE uses msdc_hclk(273MHz), CMD13'interval calculation
> >> driver
> >>>
> >>>> should use 273MHz to get the actual time, so the actual clock is
> >>>
> >>>> 27.3MHz.
> >>>
> >>>>
> >>>
> >>>
> >>> You're right, I've misread the datasheet, just rechecked and it
> >> reports RO.
> >>>
> >>>
> >>>> If CIT is 0x1000 by default, CMD idle time: 0x1000 * 1 / 27.3MHz
> =
> >>>
> >>>> around 150us.
> >>>
> >>>>
> >>>
> >>>> In addition the bootloader will not set the CQCAP's ITCFVAL and
> >> ITCFMUL
> >>>
> >>>> fields, because these fields of CQCAP register is RO(readonly),
> so
> >> we
> >>>
> >>>> can ignore the change for the CQCAP's ITCFVAL and ITCFMUL
> fields.
> >>>
> >>>>
> >>>
> >>>
> >>> Yes, that's right, again - this means that you should go for the
> >> first
> >>>
> >>> proposed implementation, as future MediaTek SoCs may (or may not)
> >> change
> >>>
> >>> that: if you implement as proposed, this is going to be a one-
> time
> >> thing
> >>>
> >>> and future SoCs won't need specific changes.
> >>>
> >>>
> >>> That implementation also documents the flow about how we're
> getting
> >> to
> >>>
> >>> the actual value, which is important for community people reading
> >> this
> >>>
> >>> driver in the future for debugging purposes.
> >>>
> >>>
> >>> Regards,
> >>>
> >>> Angelo
> >>>
> >>>
> >>>
> >>> Thanks for your proposal.
> >>>
> >>>
> >>> I have discussed with our designer, and this fields of CQCAP's
> >> ITCFVAL and ITCFMUL will not change.
> >>> If we add more code for it, these codes will also affect the
> >> execution efficiency, even if it has a very
> >>> small effect.
> >>> I think if it's just for reading convenience, we can add mode
> >> comments to make it easier to read the code.
> >>> Do you think it's okay to add more comments?
> >>>
> >>
> >> This isn't a performance path, but anyway, if you think that it
> will
> >> be at some
> >> point, you can read the two registers at probe time as part of the
> >> MMC_CAP2_CQE
> >> if branch, and then cache the invariable values to `struct
> >> msdc_host`: this
> >> will make you able to never perform register reads for
> ITCFVAL/FMUL
> >> in
> >> msdc_cqe_enable(), resolving the efficiency issue.
> >>
> >> Even better, instead of caching ITCFVAL/FMUL to two variables,
> since
> >> the idle
> >> timer value likely won't ever change during runtime, you can
> directly
> >> perform
> >> the calculation for SSC1 at probe time and cache that value
> instead,
> >> so that
> >> in msdc_cqe_enable() you will have something like...
> >>
> >> /* Set the send status command idle timer */
> >> cqhci_writel(cq_host, host->cq_ssc1_time, CQHCI_SSC1);
> >>
> >> where cq_ssc1_time is
> >> struct msdc_host {
> >> .......
> >> u32 cq_ssc1_time;
> >> ....
> >> }
> >>
> >> and where your probe function is
> >>
> >> static int msdc_drv_probe(struct platform_device *pdev)
> >> {
> >> ......
> >>
> >> if (mmc->caps2 & MMC_CAP2_CQE) {
> >> host->cq_host = ......
> >> ........
> >> read itcfval;
> >> read itcfmul;
> >> host->cq_ssc1_time = calculated-value;
> >> ........
> >> }
> >>
> >> .......
> >> }
> >>
> > Yes, I think it's okay for me.
> > Another problem, ITCFVAL reports 182 for MediaTek SoCs, but we can
> not
> > use it to calculate, as i said earlier, since our CQE uses
> > msdc_hclk(273MHz), CMD13' interval calculation drivers should use
> > 273MHz to get the actual time, not 182MHz.
> > If we use ITCFVAL, we will get a wrong value.
> > So I think it's meaningless.
>
> clk_get_rate(msdc_hclk) gives you the current msdc_hclk clock rate:
> use it
> in place of reading ITCFVAL, that's your solution.
>
> I would imagine that *at least* ITCFMUL is correct on MediaTek SoCs,
> so you
> can use that one as it is.
Thanks for your review.
I will implement it in the next version.

Begards,
Wenbin
>
> Regards,
> Angelo
>
> >
> > Begards,
> > Wenbin
> >> Regards,
> >> Angelo
> >>
> >>
> >>> Begards,
> >>> Wenbin
> >>>
> >>>> Thanks
> >>>
> >>>> Wenbin
> >>>
> >>>>>
> >>>
> >>>>>> 0x40 that corresponds to 3us, which can improve the
> performance
> >> of
> >>>
> >>>>>> some
> >>>
> >>>>>> eMMC devices.
> >>>
> >>>>>>
> >>>
> >>>>>> Signed-off-by: Wenbin Mei <wenbin.mei@xxxxxxxxxxxx>
> >>>
> >>>>>> ---
> >>>
> >>>>>> drivers/mmc/host/mtk-sd.c | 4 ++++
> >>>
> >>>>>> 1 file changed, 4 insertions(+)
> >>>
> >>>>>>
> >>>
> >>>>>> diff --git a/drivers/mmc/host/mtk-sd.c b/drivers/mmc/host/mtk-
> >> sd.c
> >>>
> >>>>>> index edade0e54a0c..ffeccddcd028 100644
> >>>
> >>>>>> --- a/drivers/mmc/host/mtk-sd.c
> >>>
> >>>>>> +++ b/drivers/mmc/host/mtk-sd.c
> >>>
> >>>>>> @@ -2453,6 +2453,7 @@ static void
> >> msdc_hs400_enhanced_strobe(struct
> >>>
> >>>>>> mmc_host *mmc,
> >>>
> >>>>>> static void msdc_cqe_enable(struct mmc_host *mmc)
> >>>
> >>>>>> {
> >>>
> >>>>>> struct msdc_host *host = mmc_priv(mmc);
> >>>
> >>>>>> + struct cqhci_host *cq_host = mmc->cqe_private;
> >>>
> >>>>>>
> >>>
> >>>>>> /* enable cmdq irq */
> >>>
> >>>>>> writel(MSDC_INT_CMDQ, host->base + MSDC_INTEN);
> >>>
> >>>>>> @@ -2462,6 +2463,9 @@ static void msdc_cqe_enable(struct
> >> mmc_host
> >>>
> >>>>>> *mmc)
> >>>
> >>>>>> msdc_set_busy_timeout(host, 20 * 1000000000ULL, 0);
> >>>
> >>>>>> /* default read data timeout 1s */
> >>>
> >>>>>> msdc_set_timeout(host, 1000000000ULL, 0);
> >>>
> >>>>>> +
> >>>
> >>>>>> + /* decrease the send status command idle timer to 3us */
> >>>
> >>>>>> + cqhci_writel(cq_host, 0x40, CQHCI_SSC1);
> >>>
> >>>>>> }
> >>>
> >>>>>>
> >>>
> >>>>>> static void msdc_cqe_disable(struct mmc_host *mmc, bool
> >> recovery)
> >>>
> >>>>>
> >>>
> >>>>>
> >>>
> >>>
> >>>
> >>
>