Re: [PATCH v4 5/7] mm: rework non-root kmem_cache lifecycle management

From: Waiman Long
Date: Tue May 21 2019 - 15:38:33 EST


On 5/21/19 3:23 PM, Roman Gushchin wrote:
> On Tue, May 21, 2019 at 02:39:50PM -0400, Waiman Long wrote:
>> On 5/14/19 8:06 PM, Shakeel Butt wrote:
>>>> @@ -2651,20 +2652,35 @@ struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep)
>>>> struct mem_cgroup *memcg;
>>>> struct kmem_cache *memcg_cachep;
>>>> int kmemcg_id;
>>>> + struct memcg_cache_array *arr;
>>>>
>>>> VM_BUG_ON(!is_root_cache(cachep));
>>>>
>>>> if (memcg_kmem_bypass())
>>>> return cachep;
>>>>
>>>> - memcg = get_mem_cgroup_from_current();
>>>> + rcu_read_lock();
>>>> +
>>>> + if (unlikely(current->active_memcg))
>>>> + memcg = current->active_memcg;
>>>> + else
>>>> + memcg = mem_cgroup_from_task(current);
>>>> +
>>>> + if (!memcg || memcg == root_mem_cgroup)
>>>> + goto out_unlock;
>>>> +
>>>> kmemcg_id = READ_ONCE(memcg->kmemcg_id);
>>>> if (kmemcg_id < 0)
>>>> - goto out;
>>>> + goto out_unlock;
>>>>
>>>> - memcg_cachep = cache_from_memcg_idx(cachep, kmemcg_id);
>>>> - if (likely(memcg_cachep))
>>>> - return memcg_cachep;
>>>> + arr = rcu_dereference(cachep->memcg_params.memcg_caches);
>>>> +
>>>> + /*
>>>> + * Make sure we will access the up-to-date value. The code updating
>>>> + * memcg_caches issues a write barrier to match this (see
>>>> + * memcg_create_kmem_cache()).
>>>> + */
>>>> + memcg_cachep = READ_ONCE(arr->entries[kmemcg_id]);
>>>>
>>>> /*
>>>> * If we are in a safe context (can wait, and not in interrupt
>>>> @@ -2677,10 +2693,20 @@ struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep)
>>>> * memcg_create_kmem_cache, this means no further allocation
>>>> * could happen with the slab_mutex held. So it's better to
>>>> * defer everything.
>>>> + *
>>>> + * If the memcg is dying or memcg_cache is about to be released,
>>>> + * don't bother creating new kmem_caches. Because memcg_cachep
>>>> + * is ZEROed as the fist step of kmem offlining, we don't need
>>>> + * percpu_ref_tryget() here. css_tryget_online() check in
>>> *percpu_ref_tryget_live()
>>>
>>>> + * memcg_schedule_kmem_cache_create() will prevent us from
>>>> + * creation of a new kmem_cache.
>>>> */
>>>> - memcg_schedule_kmem_cache_create(memcg, cachep);
>>>> -out:
>>>> - css_put(&memcg->css);
>>>> + if (unlikely(!memcg_cachep))
>>>> + memcg_schedule_kmem_cache_create(memcg, cachep);
>>>> + else if (percpu_ref_tryget(&memcg_cachep->memcg_params.refcnt))
>>>> + cachep = memcg_cachep;
>>>> +out_unlock:
>>>> + rcu_read_lock();
>> There is one more bug that causes the kernel to panic on bootup when I
>> turned on debugging options.
>>
>> [ÂÂ 49.871437] =============================
>> [ÂÂ 49.875452] WARNING: suspicious RCU usage
>> [ÂÂ 49.879476] 5.2.0-rc1.bz1699202_memcg_test+ #2 Not tainted
>> [ÂÂ 49.884967] -----------------------------
>> [ÂÂ 49.888991] include/linux/rcupdate.h:268 Illegal context switch in
>> RCU read-side critical section!
>> [ÂÂ 49.897950]
>> [ÂÂ 49.897950] other info that might help us debug this:
>> [ÂÂ 49.897950]
>> [ÂÂ 49.905958]
>> [ÂÂ 49.905958] rcu_scheduler_active = 2, debug_locks = 1
>> [ÂÂ 49.912492] 3 locks held by systemd/1:
>> [ÂÂ 49.916252]Â #0: 00000000633673c5 (&type->i_mutex_dir_key#5){.+.+},
>> at: lookup_slow+0x42/0x70
>> [ÂÂ 49.924788]Â #1: 0000000029fa8c75 (rcu_read_lock){....}, at:
>> memcg_kmem_get_cache+0x12b/0x910
>> [ÂÂ 49.933316]Â #2: 0000000029fa8c75 (rcu_read_lock){....}, at:
>> memcg_kmem_get_cache+0x3da/0x910
>>
>> It should be "rcu_read_unlock();" at the end.
> Oops. Good catch, thanks Waiman!
>
> I'm somewhat surprised it didn't get up in my tests, neither any of test
> bots caught it. Anyway, I'll fix it and send v5.

In non-preempt kernel rcu_read_lock() is almost a no-op. So you probably
won't see any ill effect with this bug.

>
> Does the rest of the patchset looks sane to you?

I haven't done a full review of the patch, but it looks sane to me from
my cursory look at it. We hit similar problem in Red Hat. That is why I
am looking at your patch. Looking forward to your v5 patch.

Cheers,
Longman