Re: using mce_inject I get: RIP 10:<ffffffffa012c909> {ttm_bo_unref+0xf/0x45[ttm]}
From: Justin P. Mattock
Date: Tue Aug 30 2011 - 11:38:43 EST
On 08/29/2011 06:07 PM, huang ying wrote:
On Sat, Aug 27, 2011 at 11:03 PM, Justin P. Mattock
<justinmattock@xxxxxxxxx> wrote:
On 08/23/2011 01:15 PM, Luck, Tony wrote:
its easily fixable, but not sure its a good idea due to bisect going
through commits(afraid I might go astray with the bisect if I add any
patches).
Rather than fixing a bad build - you can try moving to a nearby commit
(use "gitk" to get a view of the structure around the commit that git
bisect suggested). In the early stages of a bisection, it doesn't really
matter much if you build the mid-point that bisect provided, or some
nearby on - just be sure to mark good/bad the commit you actually built.
-Tony
well.. after bisecting(with no results), I found that something in my
.config was causing this, so after looking through, I found that having
X86_MCE_INJECT = y causes the pauses when the timeouts occur
let me know if I need to supply any info.
Which test case cause the pause? Some test case with "timeout" in
name may cause timeout between CPUs. Or you can try boot system with
kernel parameter "mce=3,0", which will disable timeout.
Best Regards,
Huang Ying
cool thanks for the info.
I went and used mce=3,0 on the command line, and then ran the mce-test
suite. unfortunantly the pause still occurs.
as for which timeouts bassically when any of the timeouts
here is what the verbosity looks like:
`/home/kernel/mce-inject/mce-test'
./drivers/simple/driver.sh simple.conf
soft-inj/non-panic/corrected:
Failed: can not get gcov graph
Passed: MCE log is ok
Passed: No kernel warning or bug
soft-inj/non-panic/corrected_hold:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
soft-inj/non-panic/corrected_no_en:
Failed: can not get gcov graph
Passed: MCE log is ok
Passed: No kernel warning or bug
soft-inj/non-panic/corrected_over:
Failed: can not get gcov graph
Passed: MCE log is ok
Passed: No kernel warning or bug
soft-inj/panic/fatal:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: Fatal Machine check
Failed: uncorrected MCE exp, expected: Processor context corrupt
soft-inj/panic/fatal_eipv:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: Fatal Machine check
Failed: uncorrected MCE exp, expected: Processor context corrupt
soft-inj/panic/fatal_irq:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: Fatal Machine check
Failed: uncorrected MCE exp, expected: Processor context corrupt
soft-inj/panic/fatal_no_en:
Failed: can not get gcov graph
Passed: MCE log is ok
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: Machine check from unknown source
soft-inj/panic/fatal_over:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: Fatal Machine check
Failed: uncorrected MCE exp, expected: Processor context corrupt
soft-inj/panic/fatal_ripv:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: Fatal Machine check
Failed: uncorrected MCE exp, expected: Processor context corrupt
soft-inj/panic/fatal_timeout:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: : Fatal machine check on current CPU
Failed: no timeout detected
Failed: uncorrected MCE exp, expected: Processor context corrupt
soft-inj/panic/fatal_timeout_ripv:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: : Fatal machine check on current CPU
Failed: no timeout detected
Failed: uncorrected MCE exp, expected: Processor context corrupt
soft-inj/panic/fatal_userspace:
Failed: can not get gcov graph
Failed: MCE log is different from input
Passed: No kernel warning or bug
Failed: uncorrect panic, expected: Fatal Machine check
Failed: uncorrected MCE exp, expected: Processor context corrupt
in dmesg I see:
[ 102.491609] Starting machine check poll CPU 1
[ 102.492077] [Hardware Error]: Machine check events logged
[ 102.492086] Machine check poll done on CPU 1
[ 123.537575] Triggering MCE exception on CPU 0
[ 123.537584] Disabling lock debugging due to kernel taint
[ 123.537594] [Hardware Error]: Machine check events logged
[ 123.537597] MCE exception done on CPU 0
[ 129.779850] Triggering MCE exception on CPU 1
[ 129.779879] MCE exception done on CPU 1
[ 137.030085] Triggering MCE exception on CPU 0
[ 137.030108] MCE exception done on CPU 0
[ 143.286096] Triggering MCE exception on CPU 0
[ 143.286110] MCE exception done on CPU 0
[ 149.541391] Triggering MCE exception on CPU 0
[ 149.541409] MCE exception done on CPU 0
[ 156.785580] Triggering MCE exception on CPU 1
[ 156.785602] MCE exception done on CPU 1
[ 164.011576] Triggering MCE exception on CPU 0
[ 164.012558] mce_notify_irq: 4 callbacks suppressed
[ 164.012558] [Hardware Error]: Machine check events logged
[ 166.795340] MCE exception done on CPU 0
[ 173.088624] Triggering MCE exception on CPU 0
[ 173.089600] [Hardware Error]: Machine check events logged
[ 177.119421] MCE exception done on CPU 0
[ 184.373355] Triggering MCE exception on CPU 1
[ 184.373372] MCE exception done on CPU 1
[ 190.741030] Triggering MCE exception on CPU 1
[ 190.741047] MCE exception done on CPU 1
let me know if you need more info.
Justin P. Mattock
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/