Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition

From: Minchan Kim
Date: Mon Jan 18 2016 - 09:07:44 EST


On Mon, Jan 18, 2016 at 01:18:31PM +0100, Vlastimil Babka wrote:
> On 01/18/2016 09:20 AM, Minchan Kim wrote:
> >On Mon, Jan 18, 2016 at 08:54:07AM +0100, Vlastimil Babka wrote:
> >>On 18.1.2016 8:39, Sergey Senozhatsky wrote:
> >>>On (01/18/16 16:11), Minchan Kim wrote:
> >>>[..]
> >>>>>so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
> >>>>>barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
> >>>>>so I don't think WRITE_ONCE() will help the compiler, or am I missing
> >>>>>something?
> >>>>
> >>>>We need two things
> >>>>2. memory barrier.
> >>>>
> >>>>As compiler barrier, WRITE_ONCE works to prevent store tearing here
> >>>>by compiler.
> >>>>However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
> >>>>so another CPU could see stale data caused CPU memory reordering.
> >>>
> >>>oh... good find! lost release semantic of unpin_tag()...
> >>
> >>Ah, release semantic, good point indeed. OK then we need the v2 approach again,
> >>with WRITE_ONCE() in record_obj(). Or some kind of record_obj_release() with
> >>release semantic, which would be a bit more effective, but I guess migration is
> >>not that critical path to be worth introducing it.
> >
> >WRITE_ONCE in record_obj would add more memory operations in obj_malloc
>
> A simple WRITE_ONCE would just add a compiler barrier. What you
> suggests below does indeed add more operations, which are actually
> needed just in the migration. What I suggested is the v2 approach of
> adding the PIN bit before calling record_obj, *and* simply doing a
> WRITE_ONCE in record_obj() to make sure the PIN bit is indeed
> applied *before* writing to the handle, and not as two separate
> writes.
>
> >but I don't feel it's too heavy in this phase so,
>
> I'm afraid it's dangerous for the usage of record_obj() in
> zs_malloc() where the handle is freshly allocated by alloc_handle().
> Are we sure the bit is not set?
>
> The code in alloc_handle() is:
> return (unsigned long)kmem_cache_alloc(pool->handle_cachep,
> pool->flags & ~__GFP_HIGHMEM);
>
> There's no explicit __GFP_ZERO, so the handles are not guaranteed to
> be allocated empty? And expecting all zpool users to include
> __GFP_ZERO in flags would be too subtle and error prone.

True.
Let's go with this. I hope it's the last.
Thanks, guys.