Re: [PATCH -mm -V7] mm, swap: fix race between swapoff and some swap operations

From: Tim Chen
Date: Tue Feb 12 2019 - 12:58:15 EST


On 2/11/19 10:47 PM, Huang, Ying wrote:
> Andrea Parri <andrea.parri@xxxxxxxxxxxxxxxxxxxx> writes:
>
>>>> + if (!si)
>>>> + goto bad_nofile;
>>>> +
>>>> + preempt_disable();
>>>> + if (!(si->flags & SWP_VALID))
>>>> + goto unlock_out;
>>>
>>> After Hugh alluded to barriers, it seems the read of SWP_VALID could be
>>> reordered with the write in preempt_disable at runtime. Without smp_mb()
>>> between the two, couldn't this happen, however unlikely a race it is?
>>>
>>> CPU0 CPU1
>>>
>>> __swap_duplicate()
>>> get_swap_device()
>>> // sees SWP_VALID set
>>> swapoff
>>> p->flags &= ~SWP_VALID;
>>> spin_unlock(&p->lock); // pair w/ smp_mb
>>> ...
>>> stop_machine(...)
>>> p->swap_map = NULL;
>>> preempt_disable()
>>> read NULL p->swap_map
>>
>>
>> I don't think that that smp_mb() is necessary. I elaborate:
>>
>> An important piece of information, I think, that is missing in the
>> diagram above is the stopper thread which executes the work queued
>> by stop_machine(). We have two cases to consider, that is,
>>
>> 1) the stopper is "executed before" the preempt-disable section
>>
>> CPU0
>>
>> cpu_stopper_thread()
>> ...
>> preempt_disable()
>> ...
>> preempt_enable()
>>
>> 2) the stopper is "executed after" the preempt-disable section
>>
>> CPU0
>>
>> preempt_disable()
>> ...
>> preempt_enable()
>> ...
>> cpu_stopper_thread()
>>
>> Notice that the reads from p->flags and p->swap_map in CPU0 cannot
>> cross cpu_stopper_thread(). The claim is that CPU0 sees SWP_VALID
>> unset in (1) and that it sees a non-NULL p->swap_map in (2).
>>
>> I consider the two cases separately:
>>
>> 1) CPU1 unsets SPW_VALID, it locks the stopper's lock, and it
>> queues the stopper work; CPU0 locks the stopper's lock, it
>> dequeues this work, and it reads from p->flags.
>>
>> Diagrammatically, we have the following MP-like pattern:
>>
>> CPU0 CPU1
>>
>> lock(stopper->lock) p->flags &= ~SPW_VALID
>> get @work lock(stopper->lock)
>> unlock(stopper->lock) add @work
>> reads p->flags unlock(stopper->lock)
>>
>> where CPU0 must see SPW_VALID unset (if CPU0 sees the work
>> added by CPU1).
>>
>> 2) CPU0 reads from p->swap_map, it locks the completion lock,
>> and it signals completion; CPU1 locks the completion lock,
>> it checks for completion, and it writes to p->swap_map.
>>
>> (If CPU0 doesn't signal the completion, or CPU1 doesn't see
>> the completion, then CPU1 will have to iterate the read and
>> to postpone the control-dependent write to p->swap_map.)
>>
>> Diagrammatically, we have the following LB-like pattern:
>>
>> CPU0 CPU1
>>
>> reads p->swap_map lock(completion)
>> lock(completion) read completion->done
>> completion->done++ unlock(completion)
>> unlock(completion) p->swap_map = NULL
>>
>> where CPU0 must see a non-NULL p->swap_map if CPU1 sees the
>> completion from CPU0.
>>
>> Does this make sense?
>
> Thanks a lot for detailed explanation!

This is certainly a non-trivial explanation of why memory barrier is not
needed. Can we put it in the commit log and mention something in
comments on why we don't need memory barrier?

Thanks.

Tim