RE: [PATCH -v2 -mm] add extra free kbytes tunable

From: David Rientjes
Date: Thu Oct 13 2011 - 16:48:20 EST

Next message: David Rientjes: "Re: [PATCH] mm: add a "struct page_frag" type containing a page,offset and length"
Previous message: Mark Brown: "Re: [PATCH] staging:iio:proof of concept in kernel interface."
In reply to: Satoru Moriya: "RE: [PATCH -v2 -mm] add extra free kbytes tunable"
Next in thread: Rik van Riel: "Re: [PATCH -v2 -mm] add extra free kbytes tunable"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, 13 Oct 2011, Satoru Moriya wrote:

> My test case is just a simple one (maybe too simple), and I tried
> to demonstrate following issues that current kernel has with it.
>
> 1. Current kernel uses free memory as pagecache.
> 2. Applications may allocate memory burstly and when it happens
> they may get a latency issue because there are not enough free
> memory. Also the amount of required memory is wide-ranging.

This is what the per-zone watermarks are intended to address and I
understand that it's not doing a good enough job for your particular
workloads. I'm trying to find a solution that mitigates that for all
threads that allocate faster than the kernel can reclaim, realtime or
otherwise, without requiring the admin to set those watermarks himself,
which is really what extra_free_kbytes is eventually leading to.

> 3. Some users would like to control the amount of free memory
> to avoid the situation above.

The only possible way to do that is with min_free_kbytes right now and
that would increase the amount of memory that realtime threads have
exclusive access to. Let's try not to add additional tunables so that
admins need to find their own optimal watermarks for every kernel release.
I see no reason why we can't add logic for rt-threads triggering reclaim
to either reclaim faster (Con's patch) or more memory than normal (an
ALLOC_HARDER type bonus in the reclaim path to reclaim 1.25 * high_wmark,
for example). We've had a rt-thread bonus in the page allocator for a
long time, I'm not saying we don't need more elsewhere.

> 4. User can't setup the amount of free memory explicitly.
> From user's point of view, the amount of free memory is the delta
> between high watermark - min watermark because below min watermark
> user applications incur a penalty (direct reclaim). The width of
> delta depends on min_free_kbytes, actually min watermark / 2, and
> so if we want to make free memory bigger, we must make
> min_free_kbytes bigger. It's not a intuitive and it introduces
> another problem that is possibility of direct reclaim is increased.
>

So you're saying that we need to increase the space between high_wmark and
min_wmark anytime that min_free_kbytes changes? That certainly may be
true and would hopefully mitigate direct reclaim becoming too intrusive
for your workload.

We _really_ don't want to cause regressions for others, though, which
extra_free_kbytes can easily do for cpu-intensive workloads if nothing is
currently requiring that extra burst of memory (and occurs because
extra_free_kbytes is a global tunable and not tied to any specific
application [like testing for rt_task()] that we can identify when
reclaiming).

> But my concern described above is still alive because whether
> latency issue happen or not depends on how heavily workloads
> allocate memory at a short time. Of cource we can say same
> things for extra_free_kbytes, but we can change it and test
> an effect easily.
>

We'll never know the future and how much memory a latency-sensitive
application will require 100ms from now. The only thing that we can do is
(i) identify the latency-sensitive app, (ii) reclaim more aggressively for
them, and (iii) reclaim additional memory in preparation for another
burst. At some point, though, userspace needs to be responsible to not
allocate enormous amounts of memory all at once and there's room for
mitigation there too to preallocate ahead of what you actually need.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: David Rientjes: "Re: [PATCH] mm: add a "struct page_frag" type containing a page,offset and length"
Previous message: Mark Brown: "Re: [PATCH] staging:iio:proof of concept in kernel interface."
In reply to: Satoru Moriya: "RE: [PATCH -v2 -mm] add extra free kbytes tunable"
Next in thread: Rik van Riel: "Re: [PATCH -v2 -mm] add extra free kbytes tunable"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]