[PATCH] einj: Documentation text corrections and streamlining

From: Borislav Petkov
Date: Thu Jan 29 2015 - 12:51:08 EST


From: Borislav Petkov <bp@xxxxxxx>

Streamline and simplify formulations, improve formatting and extend the
injection example in the error injection write up for users which we
carry in Documentation/.

Signed-off-by: Borislav Petkov <bp@xxxxxxx>
---
Documentation/acpi/apei/einj.txt | 159 ++++++++++++++++++++++-----------------
1 file changed, 89 insertions(+), 70 deletions(-)

diff --git a/Documentation/acpi/apei/einj.txt b/Documentation/acpi/apei/einj.txt
index f51861bcb07b..026e8913339c 100644
--- a/Documentation/acpi/apei/einj.txt
+++ b/Documentation/acpi/apei/einj.txt
@@ -1,98 +1,107 @@
APEI Error INJection
~~~~~~~~~~~~~~~~~~~~

-EINJ provides a hardware error injection mechanism
-It is very useful for debugging and testing of other APEI and RAS features.
+EINJ provides a hardware error injection mechanism. It is very useful
+for debugging and testing APEI and RAS features in general.

-To use EINJ, make sure the following are enabled in your kernel
+To use EINJ, make sure the following are options enabled in your kernel
configuration:

CONFIG_DEBUG_FS
CONFIG_ACPI_APEI
CONFIG_ACPI_APEI_EINJ

-The user interface of EINJ is debug file system, under the
-directory apei/einj. The following files are provided.
+The EINJ user interface is in <debugfs mount point>/apei/einj.
+
+The following files belong to it:

- available_error_type
- Reading this file returns the error injection capability of the
- platform, that is, which error types are supported. The error type
- definition is as follow, the left field is the error type value, the
- right field is error description.
-
- 0x00000001 Processor Correctable
- 0x00000002 Processor Uncorrectable non-fatal
- 0x00000004 Processor Uncorrectable fatal
- 0x00000008 Memory Correctable
- 0x00000010 Memory Uncorrectable non-fatal
- 0x00000020 Memory Uncorrectable fatal
- 0x00000040 PCI Express Correctable
- 0x00000080 PCI Express Uncorrectable fatal
- 0x00000100 PCI Express Uncorrectable non-fatal
- 0x00000200 Platform Correctable
- 0x00000400 Platform Uncorrectable non-fatal
- 0x00000800 Platform Uncorrectable fatal
-
- The format of file contents are as above, except there are only the
- available error type lines.
+
+ This file shows which error types are supported:
+
+ Error Type Value Error Description
+ ================ =================
+ 0x00000001 Processor Correctable
+ 0x00000002 Processor Uncorrectable non-fatal
+ 0x00000004 Processor Uncorrectable fatal
+ 0x00000008 Memory Correctable
+ 0x00000010 Memory Uncorrectable non-fatal
+ 0x00000020 Memory Uncorrectable fatal
+ 0x00000040 PCI Express Correctable
+ 0x00000080 PCI Express Uncorrectable fatal
+ 0x00000100 PCI Express Uncorrectable non-fatal
+ 0x00000200 Platform Correctable
+ 0x00000400 Platform Uncorrectable non-fatal
+ 0x00000800 Platform Uncorrectable fatal
+
+ The format of the file contents are as above, except present are only
+ the available error types.

- error_type
- This file is used to set the error type value. The error type value
- is defined in "available_error_type" description.
+
+ Set the value of the error type being injected. Possible error types
+ are defined in the file available_error_type above.

- error_inject
- Write any integer to this file to trigger the error
- injection. Before this, please specify all necessary error
- parameters.
+
+ Write any integer to this file to trigger the error injection. Make
+ sure you have specified all necessary error parameters, i.e. this
+ write should be the last step when injecting errors.

- flags
- Present for kernel version 3.13 and above. Used to specify which
- of param{1..4} are valid and should be used by BIOS during injection.
- Value is a bitmask as specified in ACPI5.0 spec for the
+
+ Present for kernel versions 3.13 and above. Used to specify which
+ of param{1..4} are valid and should be used by the firmware during
+ injection. Value is a bitmask as specified in ACPI5.0 spec for the
SET_ERROR_TYPE_WITH_ADDRESS data structure:
- Bit 0 - Processor APIC field valid (see param3 below)
- Bit 1 - Memory address and mask valid (param1 and param2)
- Bit 2 - PCIe (seg,bus,dev,fn) valid (param4 below)
- If set to zero, legacy behaviour is used where the type of injection
- specifies just one bit set, and param1 is multiplexed.
+
+ Bit 0 - Processor APIC field valid (see param3 below).
+ Bit 1 - Memory address and mask valid (param1 and param2).
+ Bit 2 - PCIe (seg,bus,dev,fn) valid (see param4 below).
+
+ If set to zero, legacy behavior is mimicked where the type of
+ injection specifies just one bit set, and param1 is multiplexed.

- param1
- This file is used to set the first error parameter value. Effect of
- parameter depends on error_type specified. For example, if error
- type is memory related type, the param1 should be a valid physical
- memory address. [Unless "flag" is set - see above]
+
+ This file is used to set the first error parameter value. Its effect
+ depends on the error type specified in error_type. For example, if
+ error type is memory related type, the param1 should be a valid
+ physical memory address. [Unless "flag" is set - see above]

- param2
- This file is used to set the second error parameter value. Effect of
- parameter depends on error_type specified. For example, if error
- type is memory related type, the param2 should be a physical memory
- address mask. Linux requires page or narrower granularity, say,
- 0xfffffffffffff000.
+
+ Same use as param1 above. For example, if error type is of memory
+ related type, then param2 should be a physical memory address mask.
+ Linux requires page or narrower granularity, say, 0xfffffffffffff000.

- param3
- Used when the 0x1 bit is set in "flag" to specify the APIC id
+
+ Used when the 0x1 bit is set in "flags" to specify the APIC id

- param4
- Used when the 0x4 bit is set in "flag" to specify target PCIe device
+ Used when the 0x4 bit is set in "flags" to specify target PCIe device

- notrigger
- The EINJ mechanism is a two step process. First inject the error, then
- perform some actions to trigger it. Setting "notrigger" to 1 skips the
- trigger phase, which *may* allow the user to cause the error in some other
- context by a simple access to the cpu, memory location, or device that is
- the target of the error injection. Whether this actually works depends
- on what operations the BIOS actually includes in the trigger phase.
-
-BIOS versions based in the ACPI 4.0 specification have limited options
-to control where the errors are injected. Your BIOS may support an
-extension (enabled with the param_extension=1 module parameter, or
-boot command line einj.param_extension=1). This allows the address
-and mask for memory injections to be specified by the param1 and
-param2 files in apei/einj.
-
-BIOS versions using the ACPI 5.0 specification have more control over
-the target of the injection. For processor related errors (type 0x1,
-0x2 and 0x4) the APICID of the target should be provided using the
+
+ The error injection mechanism is a two-step process. First inject the
+ error, then perform some actions to trigger it. Setting "notrigger"
+ to 1 skips the trigger phase, which *may* allow the user to cause the
+ error in some other context by a simple access to the CPU, memory
+ location, or device that is the target of the error injection. Whether
+ this actually works depends on what operations the BIOS actually
+ includes in the trigger phase.
+
+BIOS versions based on the ACPI 4.0 specification have limited options
+in controlling where the errors are injected. Your BIOS may support an
+extension (enabled with the param_extension=1 module parameter, or boot
+command line einj.param_extension=1). This allows the address and mask
+for memory injections to be specified by the param1 and param2 files in
+apei/einj.
+
+BIOS versions based on the ACPI 5.0 specification have more control over
+the target of the injection. For processor-related errors (type 0x1,
+0x2 and 0x4) the APIC ID of the target should be provided using the
param1 file in apei/einj. For memory errors (type 0x8, 0x10 and 0x20)
the address is set using param1 with a mask in param2 (0x0 is equivalent
to all ones). For PCI express errors (type 0x40, 0x80 and 0x100) the
@@ -103,27 +112,37 @@ segment, bus, device and function are specified using param1:
| segment | bus | device | function | reserved |
+-------------------------------------------------+

-An ACPI 5.0 BIOS may also allow vendor specific errors to be injected.
+An ACPI 5.0 BIOS may also allow vendor-specific errors to be injected.
In this case a file named vendor will contain identifying information
from the BIOS that hopefully will allow an application wishing to use
-the vendor specific extension to tell that they are running on a BIOS
+the vendor-specific extension to tell that they are running on a BIOS
that supports it. All vendor extensions have the 0x80000000 bit set in
error_type. A file vendor_flags controls the interpretation of param1
and param2 (1 = PROCESSOR, 2 = MEMORY, 4 = PCI). See your BIOS vendor
documentation for details (and expect changes to this API if vendors
creativity in using this feature expands beyond our expectations).

-Example:
+
+An error injection example:
+
# cd /sys/kernel/debug/apei/einj
# cat available_error_type # See which errors can be injected
0x00000002 Processor Uncorrectable non-fatal
0x00000008 Memory Correctable
0x00000010 Memory Uncorrectable non-fatal
# echo 0x12345000 > param1 # Set memory address for injection
-# echo 0xfffffffffffff000 > param2 # Mask - anywhere in this page
+# echo $((-1 << 12)) > param2 # Mask 0xfffffffffffff000 - anywhere in this page
# echo 0x8 > error_type # Choose correctable memory error
# echo 1 > error_inject # Inject now

+You should see something like this in dmesg:
+
+[22715.830801] EDAC sbridge MC3: HANDLING MCE MEMORY ERROR
+[22715.834759] EDAC sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
+[22715.834759] EDAC sbridge MC3: TSC 0
+[22715.834759] EDAC sbridge MC3: ADDR 12345000 EDAC sbridge MC3: MISC 144780c86
+[22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
+[22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)

For more information about EINJ, please refer to ACPI specification
version 4.0, section 17.5 and ACPI 5.0, section 18.6.
--
2.2.0.33.gc18b867

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/