Re: [PATCH 4/4] mm/rmap.c: move anon_vma initialization code intoanon_vma_ctor

From: Yuanhan Liu
Date: Sun Nov 03 2013 - 22:37:45 EST


On Fri, Nov 01, 2013 at 11:04:40AM -0700, Linus Torvalds wrote:
> On Fri, Nov 1, 2013 at 12:54 AM, Yuanhan Liu
> <yuanhan.liu@xxxxxxxxxxxxxxx> wrote:
> > @@ -67,19 +67,7 @@ static struct kmem_cache *anon_vma_chain_cachep;
> >
> > static inline struct anon_vma *anon_vma_alloc(void)
> > {
> > - struct anon_vma *anon_vma;
> > -
> > - anon_vma = kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL);
> > - if (anon_vma) {
> > - atomic_set(&anon_vma->refcount, 1);
> > - /*
> > - * Initialise the anon_vma root to point to itself. If called
> > - * from fork, the root will be reset to the parents anon_vma.
> > - */
> > - anon_vma->root = anon_vma;
> > - }
> > -
> > - return anon_vma;
> > + return kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL);
> > }
> >
> > static inline void anon_vma_free(struct anon_vma *anon_vma)
> > @@ -293,8 +281,15 @@ static void anon_vma_ctor(void *data)
> > struct anon_vma *anon_vma = data;
> >
> > rwlock_init(&anon_vma->rwlock);
> > - atomic_set(&anon_vma->refcount, 0);
> > anon_vma->rb_root = RB_ROOT;
> > +
> > + atomic_set(&anon_vma->refcount, 1);
> > + /*
> > + * Initialise the anon_vma root to point to itself. If called
> > + * from fork, the root will be reset to the parents anon_vma.
> > + */
> > + anon_vma->root = anon_vma;
> > +
> > }
>
> This looks totally invalid.
>
> The slab constructor is *not* called on every allocation.

Sorry, I didn't know that :(

And thanks for the detailed info very much!

--yliu

> Quite the
> reverse. Constructors are called when the underlying allocation is
> initially done, and then *not* done again, even if that particular
> object may be allocated and free'd many times.
>
> So the reason we can do
>
> atomic_set(&anon_vma->refcount, 0);
>
> in a constructor is that anybody who frees that allocation will do so
> only when the refcount goes back down to zero, so zero is "valid
> state" while the slab entry stays on some percpu freelist.
>
> But the same is ABSOLUTELY NOT TRUE of the value "1", nor is it true
> of the anon_vma->root. When the anonvma gets free'd, those values will
> *not* be the same (the refcount has been decremented to zero, and the
> root will have been set to whatever the root was.
>
> So the rule about constructors is that the values they construct
> absolutely *have* to be the ones they get free'd with. With one
> special case.
>
> Using slab constructors is almost always a mistake. The original
> Sun/Solaris argument for them was to avoid initialization costs in
> allocators, and that was pure and utter bullshit (initializing a whole
> cacheline is generally cheaper than not initializing it and having to
> fetch it from L3 caches, but it does hide the cost so that it is now
> spread out in the users rather than in the allocator).
>
> So the _original_ reason for slab is pure and utter BS, and we've
> removed pretty much all uses of the constructors.
>
> In fact, the only valid reason for using them any more is the special
> case: locks and RCU.
>
> The reason we still have constructors is that sometimes we want to
> keep certain data structures "alive" across allocations together with
> SLAB_DESTROY_BY_RCU (which delays the actual *page* destroying by RCU,
> but the allocation can go to the free-list and get re-allocated
> without a RCU grace-period).
>
> But because allocations can now "stay active" over a
> alloc/free/alloc-again sequence, that means that the allocation
> sequence MUST NOT re-initialize the lock, because some RCU user may
> still be looking at those fields (and in particular, unlocking an
> allocation that in the meantime got free'd and re-allocated).
>
> So these days, the *only* valid pattern for slab constructors is
> together with SLAB_DESTROY_BY_RCU, and making sure that the fields
> that RCU readers look at (and in particular, change) are "stable" over
> such re-allocations.
>
> Your patch is horribly wrong.
>
> Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/