Re: [PATCH RFC 1/1] arm64: Use PSCI calls for CPU stop when hotplug is supported

From: Scott Branden
Date: Wed Jan 23 2019 - 12:33:26 EST


Hi Sudeep,

On 2019-01-23 9:21 a.m., Sudeep Holla wrote:
On Wed, Jan 23, 2019 at 09:05:26AM -0800, Scott Branden wrote:
Hi Mark,

Hopefully I can shed some light on the use case inline.

On 2019-01-23 8:48 a.m., Mark Rutland wrote:
On Mon, Jan 21, 2019 at 11:30:02AM +0530, Pramod Kumar wrote:
On Mon, Jan 21, 2019 at 11:28 AM Pramod Kumar <pramod.kumar@xxxxxxxxxxxx>
wrote:

Need comes from a specific use case where one Accelerator card(SoC) is
plugged in a sever over a PCIe interface. This Card gets supply from a
battery, which could provide very less power for a very small time, in case
of any power loss. Once Card switches to battery, this has to reduce its
power consumption to its lowest point and back-up the DDR contents asap
before battery gets fully drained off.
In this example is Linux running on the server, or on the accelerator?
Accelerator
What precisely are you trying to back up from DDR, and why?
Data in DDR is being written to disk at this time (disk is connected to
accelerator)
What is responsible for backing up that contents?
A low power M-class processor and DMA engine which continues necessary
operations to transfer DDR memory to disk.

The high power processors on the accelerator running linux needed to be
halted ASAP on this power loss event and M0 take over. Graceful shutdown of
linux and other peripherals is unnecessary (and we don't have the power
necessary to do so).

It may be unnecessary for your use-case, but not recommended.
No choice - we don't have the time/power for a graceful shutdown.

Since battery can provide limited power for a very short time hence need to
transition to lowest power. As per the transition process , CPUs power
domain has to be off but before that it needs to flush out its content to
system memory(L3) so that content could be backed-up by a MCU, a controller
consuming very less power. Since we can not afford plugging-out every
individual CPUs in sequence hence uses ipi_cpu_stop for all other CPUs
which ultimately switch to ATF to flush out all the CPUs caches and comes
out of coherency domain so that its power rails could be switched-off.
If you're stopping CPUs from completely arbitrary states, what is the
benefit of saving the RAM contents?
Some of the RAM contains data that was in the process of being written to
disk by the accelerator.

This data must be saved to disk and the high power CPUs consume too much
power to continue performing this operation.

Why will suspend to ram or idle not work ? It will power off the secondaries
which this patch is trying to achieve, but in more sane way so that no
data/state is lost/corrupted as I stated earlier.
We need to take over control of the disk write operations. There is no time/power available to leave linux running.

CPUs might be running with IRQs disabled for an arbitrarily long time,
In an embedded linux system we control everything running.

By which I assume you have patches to do all sorts of things to make this
work and this patch standalone is of no use :)

I believe other patch is in a standalone driver to be upstreamed.

Remainder of code is on a standalone M0 processor not running linux.


I don't like this as it's not scalable to big systems as this is in the
same code path as system off/reset.
Many things are not used by big systems and vice versa. What do you suggest is done otherwise?

--
Regards,
Sudeep

Regards,

ÂScott