Re: [PATCH V2 1/4] cpufreq: stats: Defer stats update to cpufreq_stats_record_transition()

From: Lukasz Luba
Date: Thu Sep 24 2020 - 05:25:53 EST


Hi Rafael,

On 9/23/20 2:48 PM, Rafael J. Wysocki wrote:
On Wed, Sep 16, 2020 at 8:45 AM Viresh Kumar <viresh.kumar@xxxxxxxxxx> wrote:

In order to prepare for lock-less stats update, add support to defer any
updates to it until cpufreq_stats_record_transition() is called.

This is a bit devoid of details.

I guess you mean reset in particular, but that's not clear from the above.

Also, it would be useful to describe the design somewhat.

Signed-off-by: Viresh Kumar <viresh.kumar@xxxxxxxxxx>
---
drivers/cpufreq/cpufreq_stats.c | 75 ++++++++++++++++++++++++---------
1 file changed, 56 insertions(+), 19 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
index 94d959a8e954..3e7eee29ee86 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -22,17 +22,22 @@ struct cpufreq_stats {
spinlock_t lock;
unsigned int *freq_table;
unsigned int *trans_table;
+
+ /* Deferred reset */
+ unsigned int reset_pending;
+ unsigned long long reset_time;
};

-static void cpufreq_stats_update(struct cpufreq_stats *stats)
+static void cpufreq_stats_update(struct cpufreq_stats *stats,
+ unsigned long long time)
{
unsigned long long cur_time = get_jiffies_64();

- stats->time_in_state[stats->last_index] += cur_time - stats->last_time;
+ stats->time_in_state[stats->last_index] += cur_time - time;
stats->last_time = cur_time;
}

-static void cpufreq_stats_clear_table(struct cpufreq_stats *stats)
+static void cpufreq_stats_reset_table(struct cpufreq_stats *stats)
{
unsigned int count = stats->max_state;

@@ -41,42 +46,67 @@ static void cpufreq_stats_clear_table(struct cpufreq_stats *stats)
memset(stats->trans_table, 0, count * count * sizeof(int));
stats->last_time = get_jiffies_64();
stats->total_trans = 0;
+
+ /* Adjust for the time elapsed since reset was requested */
+ WRITE_ONCE(stats->reset_pending, 0);

What if this runs in parallel with store_reset()?

The latter may update reset_pending to 1 before the below runs.
Conversely, this may clear reset_pending right after store_reset() has
set it to 1, but before it manages to set reset_time. Is that not a
problem?

I wonder if we could just drop the reset feature. Is there a tool
which uses this file? The 'reset' sysfs would probably have to stay
forever, but an empty implementation is not an option?
The documentation states:
'This can be useful for evaluating system behaviour under different
governors without the need for a reboot.'
With the scenario of fast-switch this resetting complicates the
implementation and the justification of having it just for experiments
avoiding reboot is IMO weak. The real production code would have to pay
extra cycles every time. Also, we would probably not experiment with
cpufreq different governors, since the SchedUtil is considered the best
option.

Regards,
Lukasz