Re: [PATCH 1/1] kasan: fix livelock in qlist_move_cache

From: Zhouyi Zhou
Date: Tue Nov 28 2017 - 18:41:17 EST


Hi,
I will try to reestablish the environment, and design proof of
concept of experiment.
Cheers

On Wed, Nov 29, 2017 at 1:57 AM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> On Tue, Nov 28, 2017 at 6:56 PM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>> On Tue, Nov 28, 2017 at 12:30 PM, Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> wrote:
>>> Hi,
>>> By using perf top, qlist_move_cache occupies 100% cpu did really
>>> happen in my environment yesterday, or I
>>> won't notice the kasan code.
>>> Currently I have difficulty to let it reappear because the frontend
>>> guy modified some user mode code.
>>> I can repeat again and again now is
>>> kgdb_breakpoint () at kernel/debug/debug_core.c:1073
>>> 1073 wmb(); /* Sync point after breakpoint */
>>> (gdb) p quarantine_batch_size
>>> $1 = 3601946
>>> And by instrument code, maximum
>>> global_quarantine[quarantine_tail].bytes reached is 6618208.
>>
>> On second thought, size does not matter too much because there can be
>> large objects. Quarantine always quantize by objects, we can't part of
>> an object into one batch, and another part of the object into another
>> object. But it's not a problem, because overhead per objects is O(1).
>> We can push a single 4MB object and overflow target size by 4MB and
>> that will be fine.
>> Either way, 6MB is not terribly much too. Should take milliseconds to process.
>>
>>
>>
>>
>>> I do think drain quarantine right in quarantine_put is a better
>>> place to drain because cache_free is fine in
>>> that context. I am willing do it if you think it is convenient :-)
>
>
> Andrey, do you know of any problems with draining quarantine in push?
> Do you have any objections?
>
> But it's still not completely clear to me what problem we are solving.