Re: [PATCH] delayacct: track delays from ksm cow

From: David Hildenbrand
Date: Mon Mar 21 2022 - 11:46:10 EST


On 20.03.22 07:13, CGEL wrote:
> On Fri, Mar 18, 2022 at 09:24:44AM +0100, David Hildenbrand wrote:
>> On 18.03.22 02:41, CGEL wrote:
>>> On Thu, Mar 17, 2022 at 11:05:22AM +0100, David Hildenbrand wrote:
>>>> On 17.03.22 10:48, CGEL wrote:
>>>>> On Thu, Mar 17, 2022 at 09:17:13AM +0100, David Hildenbrand wrote:
>>>>>> On 17.03.22 03:03, CGEL wrote:
>>>>>>> On Wed, Mar 16, 2022 at 03:56:23PM +0100, David Hildenbrand wrote:
>>>>>>>> On 16.03.22 14:34, cgel.zte@xxxxxxxxx wrote:
>>>>>>>>> From: Yang Yang <yang.yang29@xxxxxxxxxx>
>>>>>>>>>
>>>>>>>>> Delay accounting does not track the delay of ksm cow. When tasks
>>>>>>>>> have many ksm pages, it may spend a amount of time waiting for ksm
>>>>>>>>> cow.
>>>>>>>>>
>>>>>>>>> To get the impact of tasks in ksm cow, measure the delay when ksm
>>>>>>>>> cow happens. This could help users to decide whether to user ksm
>>>>>>>>> or not.
>>>>>>>>>
>>>>>>>>> Also update tools/accounting/getdelays.c:
>>>>>>>>>
>>>>>>>>> / # ./getdelays -dl -p 231
>>>>>>>>> print delayacct stats ON
>>>>>>>>> listen forever
>>>>>>>>> PID 231
>>>>>>>>>
>>>>>>>>> CPU count real total virtual total delay total delay average
>>>>>>>>> 6247 1859000000 2154070021 1674255063 0.268ms
>>>>>>>>> IO count delay total delay average
>>>>>>>>> 0 0 0ms
>>>>>>>>> SWAP count delay total delay average
>>>>>>>>> 0 0 0ms
>>>>>>>>> RECLAIM count delay total delay average
>>>>>>>>> 0 0 0ms
>>>>>>>>> THRASHING count delay total delay average
>>>>>>>>> 0 0 0ms
>>>>>>>>> KSM count delay total delay average
>>>>>>>>> 3635 271567604 0ms
>>>>>>>>>
>>>>>>>>
>>>>>>>> TBH I'm not sure how particularly helpful this is and if we want this.
>>>>>>>>
>>>>>>> Thanks for replying.
>>>>>>>
>>>>>>> Users may use ksm by calling madvise(, , MADV_MERGEABLE) when they want
>>>>>>> save memory, it's a tradeoff by suffering delay on ksm cow. Users can
>>>>>>> get to know how much memory ksm saved by reading
>>>>>>> /sys/kernel/mm/ksm/pages_sharing, but they don't know what the costs of
>>>>>>> ksm cow delay, and this is important of some delay sensitive tasks. If
>>>>>>> users know both saved memory and ksm cow delay, they could better use
>>>>>>> madvise(, , MADV_MERGEABLE).
>>>>>>
>>>>>> But that happens after the effects, no?
>>>>>>
>>>>>> IOW a user already called madvise(, , MADV_MERGEABLE) and then gets the
>>>>>> results.
>>>>>>
>>>>> Image user are developing or porting their applications on experiment
>>>>> machine, they could takes those benchmark as feedback to adjust whether
>>>>> to use madvise(, , MADV_MERGEABLE) or it's range.
>>>>
>>>> And why can't they run it with and without and observe performance using
>>>> existing metrics (or even application-specific metrics?)?
>>>>
>>>>
>>> I think the reason why we need this patch, is just like why we need
>>> swap,reclaim,thrashing getdelay information. When system is complex,
>>> it's hard to precise tell which kernel activity impact the observe
>>> performance or application-specific metrics, preempt? cgroup throttle?
>>> swap? reclaim? IO?
>>>
>>> So if we could get the factor's precise impact data, when we are tunning
>>> the factor(for this patch it's ksm), it's more efficient.
>>>
>>
>> I'm not convinced that we want to make or write-fault handler more
>> complicated for such a corner case with an unclear, eventual use case.
>
> IIRC, KSM is designed for VM. But recently we found KSM works well for
> system with many containers(save about 10%~20% of total memroy), and
> container technology is more popular today, so KSM may be used more.
>
> To reduce the impact for write-fault handler, we may write a new function
> with ifdef CONFIG_KSM inside to do this job?

Maybe we just want to catch the impact of the write-fault handler when
copying more generally?

>
>> IIRC, whenever using KSM you're already agreeing to eventually pay a
>> performance price, and the price heavily depends on other factors in the
>> system. Simply looking at the number of write-faults might already give
>> an indication what changed with KSM being enabled.
>>
> While saying "you're already agreeing to pay a performance price", I think
> this is the shortcoming of KSM that putting off it being used more widely.
> It's not easy for user/app to decide how to use madvise(, ,MADV_MERGEABLE).

... and my point is that the metric you're introducing might absolutely
not be expressive for such users playing with MADV_MERGEABLE. IMHO
people will look at actual application performance to figure out what
"harm" will be done, no?

But I do see value in capturing how many COW we have in general --
either via a counter or via a delay as proposed by you.

>
> Is there a more easy way to use KSM, enjoying memory saving while minimum
> the performance price for container? We think it's possible, and are working
> for a new patch: provide a knob for cgroup to enable/disable KSM for all tasks
> in this cgroup, so if your container is delay sensitive just leave it, and if
> not you can easy to enable KSM without modify app code.
>
> Before using the new knob, user might want to know the precise impact of KSM.
> I think write-faults is indirection. If indirection is good enough, why we need
> taskstats and PSI? By the way, getdelays support container statistics.

Would anything speak against making this more generic and capturing the
delay for any COW, not just for KSM?

--
Thanks,

David / dhildenb