RE: [RFC PATCH 0/4] Support for passing runtime state idle time to TF-A

From: Souvik Chakravarty
Date: Mon Apr 26 2021 - 06:11:07 EST


Hi Sowjanya,

> From: Sowjanya Komatineni
> Sent: Friday, April 23, 2021 11:25 PM
>
> On 4/23/21 1:16 PM, Lukasz Luba wrote:
> > Hi Sowjanya,
> >
> > On 4/22/21 9:30 PM, Sowjanya Komatineni wrote:
> >> Tegra194 and Tegra186 platforms use separate MCE firmware for CPUs
> >> which is in charge of deciding on state transition based on target
> >> state, state idle time, and some other Tegra CPU core cluster states
> >> information.
> >>
> >> Current PSCI specification don't have function defined for passing
> >> runtime state idle time predicted by governor (based on next events
> >> and state target
> >> residency) to ARM trusted firmware.
> >
> > Do you have some numbers from experiments showing that these idle
> > governor prediction values, which are passed from kernel to MCE
> > firmware, are making a good 'guess'?
> > How much precision (1us? 1ms?) in the values do you need there?
>
> it could also be in few ms depending on when next cpu event/activity might
> happen which is not transparent to MCE firmware.
>
> >
> > IIRC (probably Rafael's presentations) predicting in the kernel
> > something like CPU idle time residency is not a trivial thing.
> >
> > Another idea (depending on DT structure and PSCI bits):
> > Could this be solved differently, but just having a knowledge that if
> > the governor requested some C-state, this means governor 'predicted'
> > an idle residency to be greater that min_residency attached to this
> > C-state?
> > Then, when that request shows up in your FW, you know that it must be
> > at least min_residency because of this C-state id.
> C6 is the only deepest state for Tegra194 Carmel CPU that we support in
> addition to C1 (WFI) idle state.
>
> MCE firmware gets state crossover thresholds for C1 to C6 transition from TF-
> A and uses it along with state idle time to decide on C6 state entry based on
> its background work.
>
> Assuming for now if we use min_residency as state idle time which is static
> value from DT, then it enters into deepest state C6 always as we use
> min_residency value we use is always higher than state crossover threshold.
>
> But MCE firmware is not aware of when next cpu event can happen to
> predict if next event can take longer than state min_residency time.
>
> Using min residency in such case is very conservative where MCE firmware
> exits C6 state early where we may not have better power saving.
>
> But with MCE firmware being aware of when next event can happen it can
> use that to stay in C6 state without early exit for better power savings.

This part confuses me. Are you saying that the firmware will forcefully wake up
the core, even if no wakeups are pending, when min residency for C6 expires?

Regards,
Souvik

>
> > It would depend on number of available states, max_residency, scale
> > that you would choose while assigning values from [0, max_residency]
> > to each state.
> > IIRC there can be many state IDs for idle, so it would depend on
> > number of bits encoding this state, and your needs. Example of linear
> > scale:
> > 4-bits encoding idle state and max predicted residency 10msec, that
> > means 10000us / 16 states = 625us/state.
> > The max_residency might be split differently, using different than
> > linear function, to have some rage more precised.
> >
> > Open question is if these idle states must be all represented in DT,
> > or there is a way of describing a 'set of idle states'
> > automatically.
> We only support C6 state through DT as C6 is the only deepest state for
> Tegra194 carmel CPU. WFI idle state is completely handled by kernel and
> does not require MCE sequences for entry/exit.
> >
> > Regards,
> > Lukasz