Re: [RFC PATCH 0/3] Change how we determine when to hand out THPs

From: Andrea Arcangeli
Date: Tue Dec 17 2013 - 12:57:26 EST


On Tue, Dec 17, 2013 at 10:20:07AM -0600, Alex Thorlton wrote:
> This message in particular:
>
> https://lkml.org/lkml/2013/8/2/697

I think adding a prctl (or similar) inherited by child to turn off THP
would be a fine addition to the current madvise. So you can then run
any static app under a wrapper like "THP_disable ./whatever"

The idea is, if the software is maintained, madvise allows for
finegrined optimization, if the software is legacy proprietary
statically linked (or if it already uses LD_PRELOAD for other things),
prctl takes care of that in a more coarse way (but still per-app).

> The thread I mention above originally proposed a per-process switch to
> disable THP without the use of madvise, but it was not very well
> received. I'm more than willing to revisit that idea, and possibly

I think you provided enough explanation of why it is needed (static
binaries, proprietary apps, annoyance of LD_PRELOAD that may collide
with other LD_PRELOAD in proprietary apps whatever), so I think a
prctl is reasonable addition to the madvise.

We also have an madvise to turn on THP selectively on embedded that
may boot with enabled=madvise to be sure not to waste any memory
because of THP. But the prctl to selectively enable doesn't make too
much sense, as one has to selectively enabled in a finegrined way to
be sure not to cause any memory waste. So I think a NOHUGEPAGE prctl
would be enough.

> meld the two (a per-process threshold, instead of a big-hammer on-off
> swtich). Let me know if that seems preferable to this idea and we can
> discuss.

The per-process threshold would be much bigger patch, I think starting
with the big-hammer on-off is preferable as it is much simpler and it
should be more than enough to take care of the rare corner cases,
while leaving the other workloads unaffected (modulo the cacheline to
check the task or mm flags) running at max speed.

To evaluate the threshold solution, a variety of benchmarks of a
multitude of apps would be necessary first, to see the effect it has
on the non-corner cases. Adding the big-hammer on-off prctl instead is
a black and white design solution that won't require black magic
settings.

Ideally if we add a threshold later it won't require any more
cacheline accesses, as the threshold would also need to be per-task or
per-mm so the runtime cost of the prctl would be zero then and it
could then become a benchmarking tweak even if we add the per-app
threshold later.

About creating heuristics to automatically detect the ideal value of
the big-hammer per-app on/off switch (or even harder the ideal value
of the per-app threshold), I think it's not going to happen because
there are too few corner cases and it wouldn't be worth the cost of it
(the cost would be significant no matter how implemented).

Every time we try to make THP smarter at auto-disabling itself for the
corner cases, we're slowing it down for everyone that gets a benefit
from it, and there's no way around it. This is why I think the
big-hammer prctl for the few corner cases is the best way to go.

Thanks!
Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/