Re: Re: [PATCH v5 1/1] mm/memory-failure: disable unpoison once hw error happens

From: HORIGUCHI NAOYA(堀口 直也)
Date: Wed Jun 15 2022 - 03:39:26 EST


On Wed, Jun 15, 2022 at 07:52:06AM +0200, Oscar Salvador wrote:
> On Wed, Jun 15, 2022 at 01:18:23PM +0800, zhenwei pi wrote:
> > Hi,
> >
> > Because memory_failure() may be called by hardware error randomly,
> > hw_memory_failure should be protected by mf_mutex to avoid this case:
> > int unpoison_memory(unsigned long pfn)
> > {
> > ...
> > if (hw_memory_failure) {
> > }
> > ... --> memory_failure() happens, and mark hw_memory_failure as true
> > mutex_lock(&mf_mutex);

I think that this race can cause the reported problem (hw_memory_failure is
unreliable outside mf_mutex), so we need put the check in mf_mutex for the proper fix.

Thanks,
Naoya Horiguchi

>
> Yeah, I am aware of that.
> But once memory_failure() sets hw_memory_failure to true, it does not really matter
> whether unpoison_memory() checks that while holding or not the lock, does it?
>
> Note that it does not really matter in the end, but I am just curious whether
> there is any strong impediment to that.
>
>
> --
> Oscar Salvador
> SUSE Labs