[PATCH] mm: hwpoison: use do_send_sig_info() instead of force_sig() (Re: PMEM error-handling forces SIGKILL causes kernel panic)

From: Naoya Horiguchi
Date: Wed Jan 16 2019 - 04:43:49 EST


[ CCed Andrew and linux-mm ]

On Fri, Jan 11, 2019 at 08:14:02AM +0000, Horiguchi Naoya(堀口 直也) wrote:
> Hi Dan, Jane,
>
> Thanks for the report.
>
> On Wed, Jan 09, 2019 at 03:49:32PM -0800, Dan Williams wrote:
> > [ switch to text mail, add lkml and Naoya ]
> >
> > On Wed, Jan 9, 2019 at 12:19 PM Jane Chu <jane.chu@xxxxxxxxxx> wrote:
> ...
> > > 3. The hardware consists the latest revision CPU and Intel NVDIMM, we suspected
> > > the CPU faulty because it generated MCE over PMEM UE in a unlikely high
> > > rate for any reasonable NVDIMM (like a few per 24hours).
> > >
> > > After swapping the CPU, the problem stopped reproducing.
> > >
> > > But one could argue that perhaps the faulty CPU exposed a small race window
> > > from collect_procs() to unmap_mapping_range() and to kill_procs(), hence
> > > caught the kernel PMEM error handler off guard.
> >
> > There's definitely a race, and the implementation is buggy as can be
> > seen in __exit_signal:
> >
> > sighand = rcu_dereference_check(tsk->sighand,
> > lockdep_tasklist_lock_is_held());
> > spin_lock(&sighand->siglock);
> >
> > ...the memory-failure path needs to hold the proper locks before it
> > can assume that de-referencing tsk->sighand is valid.
> >
> > > Also note, the same workload on the same faulty CPU were run on Linux prior to
> > > the 4.19 PMEM error handling and did not encounter kernel crash, probably because
> > > the prior HWPOISON handler did not force SIGKILL?
> >
> > Before 4.19 this test should result in a machine-check reboot, not
> > much better than a kernel crash.
> >
> > > Should we not to force the SIGKILL, or find a way to close the race window?
> >
> > The race should be closed by holding the proper tasklist and rcu read lock(s).
>
> This reasoning and proposal sound right to me. I'm trying to reproduce
> this race (for non-pmem case,) but no luck for now. I'll investigate more.

I wrote/tested a patch for this issue.
I think that switching signal API effectively does proper locking.

Thanks,
Naoya Horiguchi
---