Re: [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()

From: Michal Hocko
Date: Fri Feb 23 2018 - 05:08:45 EST


On Mon 19-02-18 14:30:36, Robert Harris wrote:
>
>
> > On 19 Feb 2018, at 12:39, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> >
> > On Mon 19-02-18 12:14:26, Robert Harris wrote:
> >>
> >>
> >>> On 19 Feb 2018, at 08:26, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> >>>
> >>> On Sun 18-02-18 16:47:55, robert.m.harris@xxxxxxxxxx wrote:
> >>>> From: "Robert M. Harris" <robert.m.harris@xxxxxxxxxx>
> >>>>
> >>>> __fragmentation_index() calculates a value used to determine whether
> >>>> compaction should be favoured over page reclaim in the event of allocation
> >>>> failure. The calculation itself is opaque and, on inspection, does not
> >>>> match its existing description. The function purports to return a value
> >>>> between 0 and 1000, representing units of 1/1000. Barring the case of a
> >>>> pathological shortfall of memory, the lower bound is instead 500. This is
> >>>> significant because it is the default value of sysctl_extfrag_threshold,
> >>>> i.e. the value below which compaction should be avoided in favour of page
> >>>> reclaim for costly pages.
> >>>>
> >>>> This patch implements and documents a modified version of the original
> >>>> expression that returns a value in the range 0 <= index < 1000. It amends
> >>>> the default value of sysctl_extfrag_threshold to preserve the existing
> >>>> behaviour.
> >>>
> >>> It is not really clear to me what is the actual problem you are trying
> >>> to solve by this patch. Is there any bug or are you just trying to
> >>> improve the current implementation to be more effective?
> >>
> >> There is not a significant bug.
> >>
> >> The first problem is that the mathematical expression in
> >> __fragmentation_index() is opaque, particularly given the lack of
> >> description in the comments or the original commit message. This patch
> >> provides such a description.
> >>
> >> Simply annotating the expression did not make sense since the formula
> >> doesn't work as advertised. The fragmentation index is described as
> >> being in the range 0 to 1000 but the bounds of the formula are instead
> >> 500 to 1000. This patch changes the formula so that its lower bound is
> >> 0.
> >
> > But why do we want to fix that in the first place? Why don't we simply
> > deprecate the tunable and remove it altogether? Who is relying on tuning
> > this option. Considering how it doesn't work as advertised and nobody
> > complaining I have that feeling that it is not really used in wildâ
>
> I think it's a useful feature. Ignoring any contrived test case, there
> will always be a lower limit on the degree of fragmentation that can be
> achieved by compaction. If someone takes the trouble to measure this
> then it is entirely reasonable that he or she should be able to inhibit
> compaction for cases when fragmentation falls below some correspondingly
> sized threshold.

Do you have any practical examples?
--
Michal Hocko
SUSE Labs