Re: [PATCH 5/5] mce: recover from "action required" errors reportedin data path in usermode

From: Minskey Guo
Date: Thu Sep 08 2011 - 05:27:16 EST


On 09/08/2011 01:16 PM, Luck, Tony wrote:
__memory_failure() handling calls some routines, such
as is_free_buddy_page(), which needs to acquire the spin
lock, zone->lock. How can we guarantee that other CPUs
haven't acquired the lock when receiving #mc broadcast
and entering #mc handlers ?
By the time I call __memory_failure() - the other cpus have
been released from mce handler - so they are back executing
normal code.
Oh, yes, I just realized that mce_end() released other
cpus. So, printk/lock is not an issue here.


But Chen Gong's earlier comments made me look again at entry_64.S
code - ane I realized that I missed seeing code in the return
path from do_machine_check() that switched from MCE stack to
regular kernel stack before processing TIF_MCE_NOTIFY.

I may go back and re-visit a path that I looked at to change
do_machine_check from "void" return to "unsigned long" and have
it return the address for the "AR" case and "0" otherwise.
Then we could switch out of machine check stack to non-mce
context to call __memory_failure(). When I looked at this
before the entry_64.S path looked plausible. The 32-bit
path looked to be painful (too many macros in entry_32.S)

Why do you plan to switch out of machine check stack while
call __memory_failure() in do_machine_check(), what's the
benefits ?

thanks
-minskey


-Tony
NïïïïïrïïyïïïbïXïïÇvï^ï)Þ{.nï+ïïïï{ïïïïzXïïïïÜ}ïïïÆzï&j:+vïïïïïïïzZ+ïï+zfïïïhïïï~ïïïïiïïïzïïwïïï?ïïïï&ï)ßfïï^jÇyïmïï@Aïaïïï 0ïïhïïi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/