RE: [PATCH 5/7 v6] trace, RAS: Add eMCA trace event interface

From: Luck, Tony
Date: Mon Jun 02 2014 - 12:22:48 EST


>> All of this stuff only applies to server systems - so quibbling over
>> a handful of *bytes* in an error record on a system that has tens,
>> hundreds or even thousands of *gigabytes* of memory seems
>> a bit pointless.
>
> But there's still only a limited number of bytes in the ring buffer no
> matter what the system, thus we still need to quibble over it.

To which I'll counter that the trace ring buffer can handle tracing of
events like page faults and context switches (can't it?) that happen
at a rate of thousands per second. Our eMCA records will normally
happen at a rate of X per month (where X may well be less than one).
If there is a storm of errors - we disable CMCI interrupts and revert
to polling. We declare a "storm" as just 15 events in a second. If we
switch to polling, then we won't poll faster than once per second.

So worst case is that we are seeing some steady flow of events that
don't quite trigger the storm detector ... about 14 events per second.

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/