Re: [Alsa-user] new source of MIDI playback slow-down identified -5a03b051ed87e72b959f32a86054e1142ac4cf55 thp: use compaction in kswapd forGFP_ATOMIC order > 0

From: Andrea Arcangeli
Date: Wed Feb 23 2011 - 13:15:22 EST


On Wed, Feb 23, 2011 at 05:44:37PM +0000, Mel Gorman wrote:
> Your logic makes sense and I can see why it might not necessarily show
> up in my tests. I was simply wondering if you spotted the problem
> directly or from looking at teh source.

I looked at the profiling and then at the source, but compaction_alloc
is on top, so it matches your findings.

This is with z1.

Samples % of Total Cum. Samples Cum. % of Total module:function
-------------------------------------------------------------------------------------------------
177786 6.178 177786 6.178 sunrpc:svc_recv
128779 4.475 306565 10.654 sunrpc:svc_xprt_enqueue
80786 2.807 387351 13.462 vmlinux:__d_lookup
62272 2.164 449623 15.626 ext4:ext4_htree_store_dirent
55896 1.942 505519 17.569 jbd2:journal_clean_one_cp_list
43868 1.524 549387 19.093 vmlinux:task_rq_lock
43572 1.514 592959 20.608 vmlinux:kfree
37620 1.307 630579 21.915 vmlinux:mwait_idle
36169 1.257 666748 23.172 vmlinux:schedule
34037 1.182 700785 24.355 e1000:e1000_clean
31945 1.110 732730 25.465 vmlinux:find_busiest_group
31491 1.094 764221 26.560 qla2xxx:qla24xx_intr_handler
30681 1.066 794902 27.626 vmlinux:_atomic_dec_and_lock
7425 0.258 xxxxxx xxxxxx vmlinux:get_page_from_freelist

This is with current compaction logic in kswapd.

Samples % of Total Cum. Samples Cum. % of Total module:function
-------------------------------------------------------------------------------------------------
1182928 17.358 1182928 17.358 vmlinux:get_page_from_freelist
657802 9.652 1840730 27.011 vmlinux:free_pcppages_bulk
579976 8.510 2420706 35.522 sunrpc:svc_xprt_enqueue
508953 7.468 2929659 42.991 sunrpc:svc_recv
490538 7.198 3420197 50.189 vmlinux:compaction_alloc
188620 2.767 3608817 52.957 vmlinux:tg_shares_up
97527 1.431 3706344 54.388 vmlinux:__d_lookup
85670 1.257 3792014 55.646 jbd2:journal_clean_one_cp_list
71738 1.052 3863752 56.698 vmlinux:mutex_spin_on_owner
71037 1.042 3934789 57.741 vmlinux:kfree

So clearly your patch may increase performance too (because of less
contention on the spinlock) but it's unlikely to make compaction_alloc
go away from the profiling. This isn't measuring irq latency, just the
time the CPU spent on each function but the two issues are connected
(as the more we call in that function the higher probability to run
into the high latency loop once in a while).

> On the plus side, the patch I posted also reduces kswapd CPU time.
> Graphing CPU usage over time, I saw the following;
>
> http://www.csn.ul.ie/~mel/postings/compaction-20110223/kswapdcpu-smooth-hydra.ps
>
> i.e. CPU usage of kswapd is also reduced. The graph is smoothened because
> the raw figures are so jagged as to be almost impossible to read. The z1
> patches and others could also further reduce it (I haven't measured it yet)
> but I thought it was interesting that IRQs being disabled for long periods
> also contribed so heavily to kswapd CPU usage.

I think it's lower contention on the heavily used zone lock may have
contributed to decreasing the overall system load if it's a large SMP,
not sure why kswapd usage went down though.

No problem so, I will test also a third kernel with your patch alone.

> Ok. If necessary we can disable it entirely for this cycle but as I'm
> seeing large sources of IRQ disabled latency in compaction and
> shrink_inactive_list, it'd be nice to get that ironed out while the
> problem is obvious too.

Sure. The current kswapd code helps to find any latency issue in
compaction ;). In fact they were totally unnoticed until we enabled it
in kswapd.

> Sure to see what the results are. I'm still hoping we can prove the high-wmark
> unnecessary due to Rik's naks. His reasoning about the corner cases it
> potentially introduces is hard, if not impossible, to disprove.

In my evaluation shrinking more on the small lists was worse for
overall zone lru balancing. That's the side effect of that change. But
I'm not against changing it to high+min like he suggested. For now
this was simpler. I've seen your patch too, that's ok with me too. But
because I don't see exactly the rationale of why it's a problem, I
don't like things that I'm uncertain about and I find the removal of
*8 simpler.

> Can you ditch all these patches in a directory somewhere because I'm
> getting confused as to which patch is which exactly :)

ok.... Let me finish sending the 3 kernels to test.

> kswapd at 100% CPU is certainly unsuitable but would like to be sure we
> are getting it down the right way without reintroducing the problems
> this 8*high_wmark check fixed.

Well the 8*high was never related to high kswapd load, simply it has
the effect that more memory is free when kswapd stops. It's very quick
at reaching 700m free, then it behaves identical as if only ~100m are
free (like now without the *8).

About kswapd: the current logic is clearly not ok in certain workloads
(my fault), so my attempt at fixing it is compaction-kswapd-3. I think
the primary problem is kswapd won't stop after the first invocation of
compaction if there's any fragmentation in any zone (could even be a
tiny dma zone). So this will fix it. But it'll still cause one
compaction invocation for every new order > 0 allocation (no big deal
for the dma zone as it's small).

If you check that vmscan.c change of compaction-kswapd-2 I think it
has better chance to work now. (also noticed __compaction_need_reclaim
doesn't need the "int order" parameter but you can ignore it, it's
harmless, worthless to fix until we know if this helps).

If even this fails, that means calling compaction even a single time
for each kswapd wakeup (in addition to direct compaction) is too
much. Next step would be to kswapd.max_order-- until it reaches zero
so it stops being called unless direct compaction is invoked too. But
then we can try this later.

compaction-no-kswapd-3 + your compaction_alloc_lowlat should fix the
problem and it's good thing kswapd misbehaved so we noticed the
latency issues.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/