Re: [PATCH v3] mm: fix race between kmem_cache destroy, create and deactivate

From: Shakeel Butt
Date: Thu May 31 2018 - 20:48:44 EST


On Thu, May 31, 2018 at 5:18 PM, Andrew Morton
<akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Tue, 29 May 2018 17:12:04 -0700 Shakeel Butt <shakeelb@xxxxxxxxxx> wrote:
>
>> The memcg kmem cache creation and deactivation (SLUB only) is
>> asynchronous. If a root kmem cache is destroyed whose memcg cache is in
>> the process of creation or deactivation, the kernel may crash.
>>
>> Example of one such crash:
>> general protection fault: 0000 [#1] SMP PTI
>> CPU: 1 PID: 1721 Comm: kworker/14:1 Not tainted 4.17.0-smp
>> ...
>> Workqueue: memcg_kmem_cache kmemcg_deactivate_workfn
>> RIP: 0010:has_cpu_slab
>> ...
>> Call Trace:
>> ? on_each_cpu_cond
>> __kmem_cache_shrink
>> kmemcg_cache_deact_after_rcu
>> kmemcg_deactivate_workfn
>> process_one_work
>> worker_thread
>> kthread
>> ret_from_fork+0x35/0x40
>>
>> To fix this race, on root kmem cache destruction, mark the cache as
>> dying and flush the workqueue used for memcg kmem cache creation and
>> deactivation.
>>
>> Signed-off-by: Shakeel Butt <shakeelb@xxxxxxxxxx>
>> ---
>> Changelog since v2:
>> - Instead of refcount, flush the workqueue
>
> This one-liner doesn't appear to fully describe the difference between
> v2 and v3, which is rather large:
>

Sorry about that, I should have explained more. The reason the diff
between v2 and v3 is large is because v3 is the complete rewrite. So,
the diff is the revert of v2 and then v3 patch. If you drop all the
previous versions and just keep v3, it will be smaller.

thanks,
Shakeel