Re: [RFC PATCH 3/9] Documentation/scrub-configure.rst: Add documentation for scrub driver

From: David Hildenbrand
Date: Mon Sep 18 2023 - 08:36:58 EST


On 18.09.23 14:28, Jonathan Cameron wrote:
On Mon, 18 Sep 2023 14:15:33 +0200
David Hildenbrand <david@xxxxxxxxxx> wrote:

On 18.09.23 12:25, Shiju Jose wrote:
Hi David,

Thanks for looking into this.
-----Original Message-----
From: David Hildenbrand <david@xxxxxxxxxx>
Sent: 18 September 2023 08:24
To: Shiju Jose <shiju.jose@xxxxxxxxxx>; linux-acpi@xxxxxxxxxxxxxxx; linux-
mm@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
Cc: rafael@xxxxxxxxxx; lenb@xxxxxxxxxx; naoya.horiguchi@xxxxxxx;
tony.luck@xxxxxxxxx; james.morse@xxxxxxx; dave.hansen@xxxxxxxxxxxxxxx;
jiaqiyan@xxxxxxxxxx; jthoughton@xxxxxxxxxx; somasundaram.a@xxxxxxx;
erdemaktas@xxxxxxxxxx; pgonda@xxxxxxxxxx; rientjes@xxxxxxxxxx;
duenwen@xxxxxxxxxx; Vilas.Sridharan@xxxxxxx; mike.malvestuto@xxxxxxxxx;
gthelen@xxxxxxxxxx; Linuxarm <linuxarm@xxxxxxxxxx>; Jonathan Cameron
<jonathan.cameron@xxxxxxxxxx>; tanxiaofei <tanxiaofei@xxxxxxxxxx>;
Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>
Subject: Re: [RFC PATCH 3/9] Documentation/scrub-configure.rst: Add
documentation for scrub driver

On 15.09.23 19:28, shiju.jose@xxxxxxxxxx wrote:
From: Shiju Jose <shiju.jose@xxxxxxxxxx>

Add documentation for scrub driver, supports configure scrub
parameters, in Documentation/scrub-configure.rst

Signed-off-by: Shiju Jose <shiju.jose@xxxxxxxxxx>
---
Documentation/scrub-configure.rst | 55
+++++++++++++++++++++++++++++++
1 file changed, 55 insertions(+)
create mode 100644 Documentation/scrub-configure.rst

diff --git a/Documentation/scrub-configure.rst
b/Documentation/scrub-configure.rst
new file mode 100644
index 000000000000..9f8581b88788
--- /dev/null
+++ b/Documentation/scrub-configure.rst
@@ -0,0 +1,55 @@
+==========================
+Scrub subsystem driver
+==========================
+
+Copyright (c) 2023 HiSilicon Limited.
+
+:Author: Shiju Jose <shiju.jose@xxxxxxxxxx>
+:License: The GNU Free Documentation License, Version 1.2
+ (dual licensed under the GPL v2) :Original Reviewers:
+
+- Written for: 6.7
+- Updated for:
+
+Introduction
+------------
+The scrub subsystem driver provides the interface for configure the

"... interface for configuring memory scrubbers in the system."

are we only configuring firmware/hw-based memory scrubbing? I assume so.
The scrub control could be used for the SW based memory scrubbing too.

Okay, looks like there is not too much hw/firmware specific in there
(besides these weird range changes).
[...]

+-------
+
+ The usage takes the form shown in this example::
+
+ # echo 0x300000 > /sys/class/scrub/scrub0/region0/addr_base
+ # echo 0x100000 > /sys/class/scrub/scrub0/region0/addr_size
+ # cat /sys/class/scrub/scrub0/region0/speed_available
+ # 1-60
+ # echo 25 > /sys/class/scrub/scrub0/region0/speed
+ # echo 1 > /sys/class/scrub/scrub0/region0/enable
+
+ # cat /sys/class/scrub/scrub0/region0/speed
+ # 0x19

Is it reasonable to return the speed as hex? You set it as dec.
Presently return speed as hex to reduce the number of callback function needed
for reading the hex/dec data because the values for the address range
need to be in hex.

If speed_available returns dec, speed better also return dec IMHO.

+ # cat /sys/class/scrub/scrub0/region0/addr_base
+ # 0x100000

But didn't we set it to 0x300000 ...
This is an emulated example for testing the RASF/RAS2 definition.
According to the RASF & RAS2 definition, the actual address range in the
platform could vary from the requested address range for the patrol scrubbing.
"The platform calculates the nearest patrol scrub boundary address
from where it can start". The platform returns the actual address range
in response to GET_PATROL_PARAMETERS command to the firmware.
Please see section 5.2.21.2.1 Hardware-based Memory Scrubbing ,
Table 5.87: Parameter Block Structure for PATROL_SCRUB in the
ACPI 6.5 specification.

So you configure [0x300000 - 0x400000] and you get [0x100000 - 0x300000]

How does that make any sense? :)

Shouldn't we rather return an error when setting a range that is
impossible, instead of the hardware deciding to scrub something
completely different (as can be seen in the example)?


A broader scrub is probably reasonable, but agreed that scrubbing narrower
is 'interesting' as not scrubbing the memory requeseted.

It's not even narrower. Both ranges don't even intersect! (sorry to say, but this configuration interface doesn't make any sense if hardware just does *something* else).

If you can't configure it properly, fail with an error.

It's really annoying that neither ACPI table provides any proper
discoverability. Whilst we can fix that long term, we are stuck with
a clunky poke it and see interface in the meantime.

Can't you set it, briefly enable it, and read the values back? Then, you can complain to the user that the configured range is impossible.

--
Cheers,

David / dhildenb