Re: [PATCH v4 05/14] soc: mediatek: mtk-svs: use svs clk control APIs

From: Matthias Brugger
Date: Mon Feb 06 2023 - 07:10:18 EST




On 06/02/2023 03:01, Roger Lu (陸瑞傑) wrote:
Hi Matthias Sir,


On Thu, 2023-02-02 at 11:29 +0100, Matthias Brugger wrote:
你好,

I got shock and thought someone used your name to reply. However,
your email account helps me clear my mind. Haha.. Nice and warm to see mandarin
on patchwork. It's so fresh and exciting :-).


谢谢。 I'm learning mainland Chinese for a few month now, I also learned that you use different symbols in Taiwan, which I don't know. 对不起。


On 01/02/2023 13:28, Roger Lu (陸瑞傑) wrote:
Hi Matthias Sir,

On Tue, 2023-01-31 at 14:19 +0100, Matthias Brugger wrote:

On 11/01/2023 08:45, Roger Lu wrote:
In MediaTek HW design, svs and thermal both use the same clk source.
It means that svs clk reference count from CCF includes thermal control
counts. That makes svs driver confuse whether it disabled svs's main clk
or not from CCF's perspective and lead to turn off their shared clk
unexpectedly. Therefore, we add svs clk control APIs to make sure svs's
main clk is controlled well by svs driver itself.

Here is a NG example. Rely on CCF's reference count and cause problem.

thermal probe (clk ref = 1)
-> svs probe (clk ref = 2)
-> svs suspend (clk ref = 1)
-> thermal suspend (clk ref = 0)
-> thermal resume (clk ref = 1)
-> svs resume (encounter error, clk ref = 1)
-> svs suspend (clk ref = 0)
-> thermal suspend (Fail here, thermal HW control w/o clk)

Fixes: a825d72f74a3 ("soc: mediatek: fix missing clk_disable_unprepare()
on
err in svs_resume()")
Signed-off-by: Roger Lu <roger.lu@xxxxxxxxxxxx>

That looks wrong. Although I don't out of my mind, there should be a way
to
tell
the clock framework that this clock is shared between several devices.

I wonder if using clk_enable and clk_disable in svs_resume/suspend
wouldn't
be
enough.

Oh yes, Common Clock Framework (CCF) knows the clock shared between several
devices and maintains clock "on/off" by reference count.


The thing is if you use clk_prepare_enable then the clock framework check's
if
the clock is already prepared, which could happen like you described in the
svs_resume (encount error) case in the commit message. The question is, can't
we
just use clk_enable and clk_disable in resume/suspend and only prepare the
clock
in the probe function?

We'll think if this can fix the problem. Thanks for the advice very much.


We concern how to stop running svs_suspend() when svs clk is already
disabled by
svs_resume(). Take an example as below, if we refers to __clk_is_enabled()
result for knowing svs clk status, it will return "true" all the time
because
thermal clk is still on. This causes the problem mentioned in commit
message.


I would expect that the kernel takes care that we can't enter a resume path
for
a device before the suspend path has finished. Honestly I don't really
understand the problem here. It seems something different then what you
described in the commit message.

Please help me understand better.

I see. This patch title needs to be changed to "avoid turning off svs clk twice
unexpectedly" for pointing out the problem precisely. We saw a loophole that svs
clk might be turned off in svs_resume() first and in svs_suspend() again without
enabling svs clk during these the process. Therefore, we try to fix it by this
patch. Below is our thinking process to explain how we got here.

1. (abandoned) We add __clk_is_enabled() check in svs_suspend() to prevent svs
clk from being turned off twice when svs_resume() turned off svs clk in the
error-handling process. Nonetheless, we met the NG case in the commit message.
2. (current patch) We add svs clk control hint to understand if we need to run
svs_suspend() or not if svs_resume() turned off svs clk before.


Did you had a look on the dev_pm_ops? Maybe we can use suspend_late, resume_early to make sure there is no race condition. I wonder also if we can't make sure that this does not happen using device links. Sorry, I can't give better guidance on how to use this technologies, but I have the feeling we can fix this with existing infrastructure.

再见。

Matthias


谢谢,再见

:-)


Sincerely,
Roger Lu