[RFC][PATCH 3/4] tracing: Add documentation for hwlat_detector tracer

From: Steven Rostedt
Date: Thu Apr 23 2015 - 15:18:11 EST


From: Jon Masters <jcm@xxxxxxxxxx>

Added the documentation on how to use th hwlat_detector.

Signed-off-by: Jon Masters <jcm@xxxxxxxxxx>
[ Updated to show move from module to tracer ]
Signed-off-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
---
Documentation/trace/hwlat_detector.txt | 61 ++++++++++++++++++++++++++++++++++
1 file changed, 61 insertions(+)
create mode 100644 Documentation/trace/hwlat_detector.txt

diff --git a/Documentation/trace/hwlat_detector.txt b/Documentation/trace/hwlat_detector.txt
new file mode 100644
index 000000000000..db98dd1fa4ed
--- /dev/null
+++ b/Documentation/trace/hwlat_detector.txt
@@ -0,0 +1,61 @@
+Introduction:
+-------------
+
+The tracer hwlat_detector is a special purpose tracer that is used to
+detect large system latencies induced by the behavior of certain underlying
+hardware or firmware, independent of Linux itself. The code was developed
+originally to detect SMIs (System Management Interrupts) on x86 systems,
+however there is nothing x86 specific about this patchset. It was
+originally written for use by the "RT" patch since the Real Time
+kernel is highly latency sensitive.
+
+SMIs are usually not serviced by the Linux kernel, which typically does not
+even know that they are occuring. SMIs are instead are set up by BIOS code
+and are serviced by BIOS code, usually for "critical" events such as
+management of thermal sensors and fans. Sometimes though, SMIs are used for
+other tasks and those tasks can spend an inordinate amount of time in the
+handler (sometimes measured in milliseconds). Obviously this is a problem if
+you are trying to keep event service latencies down in the microsecond range.
+
+The hardware latency detector works by hogging all of the cpus for configurable
+amounts of time (with interrupts disabled), polling the CPU Time Stamp Counter
+for some period, then looking for gaps in the TSC data. Any gap indicates a
+time when the polling was interrupted and since the interrupts are disabled,
+the only thing that could do that would be an SMI.
+
+Note that the SMI detector should *NEVER* be used in a production environment.
+It is intended to be run manually to determine if the hardware platform has a
+problem with long system firmware service routines.
+
+Usage:
+------
+
+Write the ASCII text "hwlat_detector" into the current_tracer file of the
+tracing system (mounted at /sys/kernel/debug/tracing). It is possible to
+redefine the threshold in microseconds (us) above which latency spikes will
+be taken into account.
+
+Example:
+
+ # echo hwlat_detector > /sys/kernel/debug/tracing/current_tracer
+ # echo 100 > /sys/kernel/debug/tracing/tracing_thresh
+
+The /sys/kernel/debug/tracing/hwlat_detector interface contains the following files:
+
+count - number of latency spikes observed since last reset
+width - time period to sample with CPUs held (usecs)
+ must be less than the total window size (enforced)
+window - total period of sampling, width being inside (usecs)
+
+By default we will set width to 500,000 and window to 1,000,000, meaning that
+we will sample every 1,000,000 usecs (1s) for 500,000 usecs (0.5s). If we
+observe any latencies that exceed the threshold (initially 100 usecs),
+then we write to a global sample ring buffer of 8K samples, which is
+consumed by reading from the "sample" (pipe) debugfs file interface.
+
+Also the following tracing directory files are used by the hwlat_detector:
+
+in /sys/kernel/debug/tracing:
+
+tracing_threshold - minimum latency value to be considered (usecs)
+tracing_max_latency - maximum hardware latency actually observed (usecs)
--
2.1.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/