Re: [RFC][PATCH] HWPOISON: only early kill processes who installed SIGBUS handler

From: Nick Piggin
Date: Wed Jun 17 2009 - 04:04:40 EST


On Wed, Jun 17, 2009 at 02:37:02PM +0800, Wu Fengguang wrote:
> On Mon, Jun 15, 2009 at 10:22:25PM +0800, Wu Fengguang wrote:
> > On Mon, Jun 15, 2009 at 08:25:28PM +0800, Nick Piggin wrote:
> > > On Mon, Jun 15, 2009 at 08:10:01PM +0800, Wu Fengguang wrote:
> > > > On Mon, Jun 15, 2009 at 03:19:07PM +0800, Nick Piggin wrote:
> > > > > > For KVM you need early kill, for the others it remains to be seen.
> > > > >
> > > > > Right. It's almost like you need to do a per-process thing, and
> > > > > those that can handle things (such as the new SIGBUS or the new
> > > > > EIO) could get those, and others could be killed.
> > > >
> > > > To send early SIGBUS kills to processes who has called
> > > > sigaction(SIGBUS, ...)? KVM will sure do that. For other apps we
> > > > don't mind they can understand that signal at all.
> > >
> > > For apps that hook into SIGBUS for some other means and
> >
> > Yes I was referring to the sigaction(SIGBUS) apps, others will
> > be late killed anyway.
> >
> > > do not understand the new type of SIGBUS signal? What about
> > > those?
> >
> > We introduced two new SIGBUS codes:
> > BUS_MCEERR_AO=5 for early kill
> > BUS_MCEERR_AR=4 for late kill
> > I'd assume a legacy application will handle them in the same way (both
> > are unexpected code to the application).
> >
> > We don't care whether the application can be killed by BUS_MCEERR_AO
> > or BUS_MCEERR_AR depending on its SIGBUS handler implementation.
> > But (in the rare case) if the handler
> > - refused to die on BUS_MCEERR_AR, it may create a busy loop and
> > flooding of SIGBUS signals, which is a bug of the application.
> > BUS_MCEERR_AO is one time and won't lead to busy loops.
> > - does something that hurts itself (ie. data safety) on BUS_MCEERR_AO,
> > it may well hurt the same way on BUS_MCEERR_AR. The latter one is
> > unavoidable, so the application must be fixed anyway.
>
> This patch materializes the automatically early kill idea.
> It aims to remove the vm.memory_failure_ealy_kill sysctl parameter.
>
> This is mainly a policy change, please comment.

Well then you can still early-kill random apps that did not
want it, and you may still cause problems if its sigbus
handler does something nontrivial.

Can you use a prctl or something so it can expclitly
register interest in this?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/